當前位置:首頁 » 操作系統 » apriori演算法c實現

apriori演算法c實現

發布時間: 2022-10-06 08:24:27

A. 機器學習有哪些演算法

1. 線性回歸
在統計學和機器學習領域,線性回歸可能是最廣為人知也最易理解的演算法之一。
2. Logistic 回歸
Logistic 回歸是機器學習從統計學領域借鑒過來的另一種技術。它是二分類問題的首選方法。
3. 線性判別分析
Logistic 回歸是一種傳統的分類演算法,它的使用場景僅限於二分類問題。如果你有兩個以上的類,那麼線性判別分析演算法(LDA)是首選的線性分類技術。
4.分類和回歸樹
決策樹是一類重要的機器學習預測建模演算法。
5. 樸素貝葉斯
樸素貝葉斯是一種簡單而強大的預測建模演算法。
6. K 最近鄰演算法
K 最近鄰(KNN)演算法是非常簡單而有效的。KNN 的模型表示就是整個訓練數據集。
7. 學習向量量化
KNN 演算法的一個缺點是,你需要處理整個訓練數據集。
8. 支持向量機
支持向量機(SVM)可能是目前最流行、被討論地最多的機器學習演算法之一。
9. 袋裝法和隨機森林
隨機森林是最流行也最強大的機器學習演算法之一,它是一種集成機器學習演算法。

想要學習了解更多機器學習的知識,推薦CDA數據分析師課程。CDA(Certified Data Analyst),即「CDA 數據分析師」,是在數字經濟大背景和人工智慧時代趨勢下,面向全行業的專業權威國際資格認證,旨在提升全民數字技能,助力企業數字化轉型,推動行業數字化發展。點擊預約免費試聽課。

B. 用Matlab實現apriori演算法關聯規則的挖掘程序,完整有詳細註解

下面這段是apriori演算法中由2頻繁項集找k頻繁項集的程序,程序中有兩個問題:
1、似乎while循環的K永遠都是固定的,也就是都是頻繁2項集的個數。得到頻繁3項集後K的個數不是要變嗎?如何體現呢?
2、程序中有兩個for的大循環,但是發現結果是只要找到一個頻繁3項集第二個for循環就會結束,但是其實還應該有其它的頻繁3項集。for循環不是應該無條件執行到參數k結束嗎?當時k值是15,可是程序結束的時候i=2,j=3,然後j就不執行4以及一直到k的部分了。是什麼原因呢?麻煩高手指點一下。急啊……
while( k>0)
le=length(candidate{1});
num=2;
nl=0;
for i=1:k-1
for j=i+1:k
x1=candidate{i}; %candidate初始值為頻繁2項集,這個表示頻繁項集的第i項
x2=candidate{j};
c = intersect(x1, x2);
M=0;
r=1;
nn=0;
l1=0;
if (length(c)==le-1) & (sum(c==x1(1:le-1))==le-1)
houxuan=union(x1(1:le),x2(le));
%樹剪枝,若一個候選項的某個K-1項子集為非頻繁,則剪枝掉
sub_set=subset(houxuan);
%生成該候選項的所有K-1項子集
NN=length(sub_set);
%判斷這些K-1項自己是否都為頻繁的
while(r & M<NN)
M=M+1;
r=in(sub_set{M},candidate);
end
if M==NN
nl=nl+1;
%候選k項集
cand{nl}=houxuan;
%記錄每個候選k項集出現的次數
le=length(cand{1});
for i=1:m
s=cand{nl};
x=X(i,:);
if sum(x(s))==le
nn=nn+1;
end
end
end
end
%從候選集中找頻繁項集
if nn>=th
ll=ll+1;
candmid{nl}=cand{nl};
pfxj(nl).element=cand{nl};
pfxj(nl).time=nn;
disp('得到的頻繁項集為:')
result=(candmid{nl});
disp(result);
end

end
end
end

C. 關聯規則分析怎麼做

關聯規則是反映一個事物與其他事物之間的相互依存性和關聯性,常用於實體商店或在線電商的推薦系統:通過對顧客的購買記錄資料庫進行關聯規則挖掘,最終目的是發現顧客群體的購買習慣的內在共性,例如購買產品A的同時也連帶購買產品B的概率,根據挖掘結果,調整貨架的布局陳列、設計促銷組合方案,實現銷量的提升,最經典的應用案例莫過於<啤酒和尿布>。

關聯規則分析中的關鍵概念包括:支持度(Support)、置信度(Confidence)與提升度(Lift)。首先,我們簡單溫故下這3個關鍵指標:

支持度 (Support)
支持度是兩件商品(A∩B)在總銷售筆數(N)中出現的概率,即A與B同時被購買的概率。

公式:

例子說明:
比如某超市2016年有100w筆銷售,顧客購買可樂又購買薯片有20w筆,顧客購買可樂又購買麵包有10w筆,那可樂和薯片的關聯規則的支持度是20%,可樂和麵包的支持度是10%。

置信度 (Confidence)
置信度是購買A後再購買B的條件概率。簡單來說就是交集部分C在A中比例,如果比例大說明購買A的客戶很大期望會購買B商品。

公式:

例子說明:
某超市2016年可樂購買次數40w筆,購買可樂又購買了薯片是30w筆,顧客購買可樂又購買麵包有10w筆,則購買可樂又會購買薯片的置信度是75%,購買可樂又購買麵包的置信度是25%,這說明買可樂也會買薯片的關聯性比麵包強,營銷上可以做一些組合策略銷售。

提升度 (Lift)
提升度表示先購買A對購買B的概率的提升作用,用來判斷規則是否有實際價值,即使用規則後商品在購物車中出現的次數是否高於商品單獨出現在購物車中的頻率。如果大於1說明規則有效,小於1則無效。

公式:

例子說明:
可樂和薯片的關聯規則的支持度是20%,購買可樂的支持度是3%,購買薯片的支持度是5%,則提升度是1.33>1, A-B規則對於商品B有提升效果。

理論很簡單,真正實踐起來卻會遇到種種困難,印證了那句"數據分析師的50%~80%的時間都花在了處理數據上」,例如一般POS明細是以下圖表形式展現:

要計算支持度(Support)、置信度(Confidence)與提升度(Lift),首先需要知道Freq(A∩B)、Freq(A)、Freq(B)和總筆數數值,那麼需要對商品進行排列組合。

所以,我們希望轉換成下表形式,如銷售ID=000001, 4種商品的兩兩組合(種):

若一個收銀小票(銷售ID)有30種商品,則組合數達到:

而可視化層級上還需要展現集團下每個分公司、每個城市、每個門店、月度、季度或者年度時間的關聯規則分析,如果用傳統的工具來實現上述分析無異於大海撈針。

下面我們就來看看在BDP中如何實現Apriori演算法,實現關聯規則分析:

商品兩兩組合的初步想法是通過量化的思想對商品進行編碼,比方說可按照增序(從1開始),算出每筆銷售單最大值,求出兩者差值得到一組數組,通過數組行轉列形式實現2種商品兩兩組合。

圖:銷售單2種商品兩兩組合邏輯圖

操作①:

【工作表】-【創建合表】-【SQL創建】

圖:商品量化

上圖轉換成日期的形式,主要目的是為下一步的數組轉列做准備,為配合explode()函數使用。其中需要說明的是上圖[日期]欄位是自定義日期,可以更改成任意日期,沒有實際日期意義。

圖:商品組合數效果

上圖主要使用的關鍵函數是FILL_DATES([日期1],[日期2]),Explode()。組合效果初顯現,只是缺另一個商品名,然後把[下一日期]欄位通過LEFT JOIN 關聯出商品B的名稱。

操作②:

【工作表】-【創建合表】-【多表關聯】 用於創建表關聯 包括(LEFT/INNER/ FULL JOIN)

圖:商品組合數實現

從上圖可以看到A商品和B商品兩兩組合邏輯已完成,在當前表基礎上我們已經可以去做連帶分析內容。

在這里,求Freq(A)和Freq(B)和總筆數數值就不祥述了,思想大致是求出所有銷售商品的A 和B商品的頻次,通過合表關聯,整合到一張大表。

操作③:

【工作表】-【創建合表】-【追加合並】合並訂單總數 ,A商品訂單數,B商品訂單數和A∩B商品連帶筆數

圖:追加合並邏輯實現

追加合並可以把相同欄位商品合並在一起,方便計算三個指標(支持度、置信度、提升度)有利於可視化展現。

操作④:

可視化展現:【BDP】-【儀表盤】

圖:儀表盤全局展示

註:為了更好體現可視化效果,這部分的可視化展示成果並非使用上述的測試數據或某個企業數據。

製作三個圖表進行購物籃分析:

圖: TOP 20商品連帶次數

上圖反映季度連帶最高頻次商品,高聯帶商品意味著對客戶吸引力大商品粘性強,同時也可以查看不同分公司的TOP20連帶情況。根據結果我們可以合理設計促銷策略,例如買2送1等。

圖:商品組合指標

置信度高說明商品連帶緊密,說明客戶連帶意願強,同時關注支持度,支持度高說明是需求量大,如果支持度低,置信度高其實對市場作用是有限小的。

圖:購物籃分析詳情

通過單價,支持度,置信度,提升度綜合指標來看待商品組合,發現高價值關聯商品,有助於提升客單價,同時也需要考慮提升度,提升度小於1,提升效果有限,可以把精力花在提升度大於1的商品組合。

同樣地,我們是否可以實現三種商品的組合呢?答案是顯然的,只要我們深入理解以上過程,三種商品關聯也是可以在BDP中實現的。 作者 熊輝 本文轉載自 BDP商業數據平台,轉載需授權

D. python apriori演算法代碼怎麼實現

classApriori(object):
def__init__(self,filename,min_support,item_start,item_end):
self.filename=filename
self.min_support=min_support#最小支持度
self.min_confidence=50
self.line_num=0#item的行數
self.item_start=item_start#取哪行的item
self.item_end=item_end
self.location=[[i]foriinrange(self.item_end-self.item_start+1)]
self.support=self.sut(self.location)
self.num=list(sorted(set([jforiinself.locationforjini])))#記錄item
self.pre_support=[]#保存前一個support,location,num
self.pre_location=[]
self.pre_num=[]
self.item_name=[]#項目名
self.find_item_name()
self.loop()
self.confidence_sup()
defdeal_line(self,line):
"提取出需要的項"
return[i.strip()foriinline.split('')ifi][self.item_start-1:self.item_end]
deffind_item_name(self):
"根據第一行抽取item_name"
withopen(self.filename,'r')asF:
forindex,lineinenumerate(F.readlines()):
ifindex==0:
self.item_name=self.deal_line(line)
break
defsut(self,location):
"""
輸入[[1,2,3],[2,3,4],[1,3,5]...]
輸出每個位置集的support[123,435,234...]
"""
withopen(self.filename,'r')asF:
support=[0]*len(location)
forindex,lineinenumerate(F.readlines()):
ifindex==0:continue
#提取每信息
item_line=self.deal_line(line)
forindex_num,iinenumerate(location):
flag=0
forjini:
ifitem_line[j]!='T':
flag=1
break
ifnotflag:
support[index_num]+=1
self.line_num=index#一共多少行,出去第一行的item_name
returnsupport
defselect(self,c):
"返回位置"
stack=[]
foriinself.location:
forjinself.num:
ifjini:
iflen(i)==c:
stack.append(i)
else:
stack.append([j]+i)
#多重列表去重
importitertools
s=sorted([sorted(i)foriinstack])
location=list(sfors,_initertools.groupby(s))
returnlocation
defdel_location(self,support,location):
"清除不滿足條件的候選集"
#小於最小支持度的剔除
forindex,iinenumerate(support):
ifi<self.line_num*self.min_support/100:
support[index]=0
#apriori第二條規則,剔除
forindex,jinenumerate(location):
sub_location=[j[:index_loc]+j[index_loc+1:]forindex_locinrange(len(j))]
flag=0
forkinsub_location:
ifknotinself.location:
flag=1
break
ifflag:
support[index]=0
#刪除沒用的位置
location=[ifori,jinzip(location,support)ifj!=0]
support=[iforiinsupportifi!=0]
returnsupport,location
defloop(self):
"s級頻繁項級的迭代"
s=2
whileTrue:
print'-'*80
print'The',s-1,'loop'
print'location',self.location
print'support',self.support
print'num',self.num
print'-'*80
#生成下一級候選集
location=self.select(s)
support=self.sut(location)
support,location=self.del_location(support,location)
num=list(sorted(set([jforiinlocationforjini])))
s+=1
iflocationandsupportandnum:
self.pre_num=self.num
self.pre_location=self.location
self.pre_support=self.support
self.num=num
self.location=location
self.support=support
else:
break
defconfidence_sup(self):
"計算confidence"
ifsum(self.pre_support)==0:
print'min_supporterror'#第一次迭代即失敗
else:
forindex_location,each_locationinenumerate(self.location):
del_num=[each_location[:index]+each_location[index+1:]forindexinrange(len(each_location))]#生成上一級頻繁項級
del_num=[iforiindel_numifiinself.pre_location]#刪除不存在上一級頻繁項級子集
del_support=[self.pre_support[self.pre_location.index(i)]foriindel_numifiinself.pre_location]#從上一級支持度查找
#printdel_num
#printself.support[index_location]
#printdel_support
forindex,iinenumerate(del_num):#計算每個關聯規則支持度和自信度
index_support=0
iflen(self.support)!=1:
index_support=index
support=float(self.support[index_location])/self.line_num*100#支持度
s=[jforindex_item,jinenumerate(self.item_name)ifindex_itemini]
ifdel_support[index]:
confidence=float(self.support[index_location])/del_support[index]*100
ifconfidence>self.min_confidence:
print','.join(s),'->>',self.item_name[each_location[index]],'min_support:',str(support)+'%','min_confidence:',str(confidence)+'%'
defmain():
c=Apriori('basket.txt',14,3,13)
d=Apriori('simple.txt',50,2,6)
if__name__=='__main__':
main()

Apriori(filename, min_support, item_start, item_end)

參數說明

filename:(路徑)文件名
min_support:最小支持度
item_start:item起始位置

item_end:item結束位置

importapriori
c=apriori.Apriori('basket.txt',11,3,13)


輸出:

E. 用C實現apriori基本演算法的代碼

Apriori演算法的實現,關鍵是建立其數學模型.以前我寫作業時,設計的數據結構如下:
#include<stdio.h>
#include<string.h>
#include<malloc.h>
#define ITEM_NAME_LENGTH 20
#define MIN_SUPPORT 2
//項集結構
struct ITEMSET
{
char itemName[ITEM_NAME_LENGTH];
struct ITEMSET *next;
};
//資料庫結構
struct TRANSACTION
{
unsigned int tranID;
struct ITEMSET *itemPoint;
struct TRANSACTION *next;
};
//大項目集結構
struct BIGITEMSET
{
struct ITEMSET *itemPoint;
unsigned int count;
struct BIGITEMSET *next;
};
//以下是資料庫
char *tran1[3]={"1","3","4"};
char *tran2[3]={"2","3","5"};
char *tran3[4]={"1","2","3","5"};
char *tran4[2]={"2","5"};
//以下是變數聲明
struct TRANSACTION *tranHead;
struct BIGITEMSET *bigHead;
struct BIGITEMSET *test;
struct BIGITEMSET *subSetHeadC1,*subSetHeadC2;
當真正理解該演算法後,再寫程序並不難.

F. 急求用C實現的Apriori演算法的 代碼

http://www.csc.liv.ac.uk/~frans/Notes/KDD/AssocRuleMine/apriori.html

.h
====================================
/*----------------------------------------------------------------------
File : apriori.h
Contents: apriori algorithm for finding frequent item sets
(specialized version for FIMI 2003 workshop)
Author : Christian Borgelt
History : 15.08.2003 file created from normal apriori.c
16.08.2003 parameter for transaction filtering added
18.08.2003 dynamic filtering decision based on times added
21.08.2003 transaction sort changed to heapsort
20.09.2003 output file made optional
----------------------------------------------------------------------*/
/*
Modified by : Frédéric Flouvat
Modifications : store the positive and negative border into an
an input trie for ABS
process stastical informations on dataset to stop
the apriori classical iterations
Author : Frédéric Flouvat
----------------------------------------------------------------------*/
#ifndef APRIRORI_H
#define APRIRORI_H

#include <iostream>
using namespace std;
#define MAXIMAL

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <time.h>
#include <assert.h>

#include "tract.h"
#include "istree.h"
#include "Application.h"

/*----------------------------------------------------------------------
Preprocessor Definitions
----------------------------------------------------------------------*/
#define PRGNAME "fim/apriori"
#define DESCRIPTION "frequent item sets miner for FIMI 2003"
#define VERSION "version 1.7 (2003.12.02) " \
"(c) 2003 Christian Borgelt"

/* --- error codes --- */
#define E_OPTION (-5) /* unknown option */
#define E_OPTARG (-6) /* missing option argument */
#define E_ARGCNT (-7) /* too few/many arguments */
#define E_SUPP (-8) /* invalid minimum support */
#define E_NOTAS (-9) /* no items or transactions */
#define E_UNKNOWN (-18) /* unknown error */

#ifndef QUIET /* if not quiet version */
#define MSG(x) x /* print messages */
#else /* if quiet version */
#define MSG(x) /* suppress messages */
#endif

#define SEC_SINCE(t) ((clock()-(t)) /(double)CLOCKS_PER_SEC)
#define RECCNT(s) (tfs_reccnt(is_tfscan(s)) \
+ ((tfs_delim(is_tfscan(s)) == TFS_REC) ? 0 : 1))
#define BUFFER(s) tfs_buf(is_tfscan(s))

/*----------------------------------------------------------------------
Constants
----------------------------------------------------------------------*/
#ifndef QUIET /* if not quiet version */

/* --- error messages --- */
static const char *errmsgs[] = {
/* E_NONE 0 */ "no error\n",
/* E_NOMEM -1 */ "not enough memory\n",
/* E_FOPEN -2 */ "cannot open file %s\n",
/* E_FREAD -3 */ "read error on file %s\n",
/* E_FWRITE -4 */ "write error on file %s\n",
/* E_OPTION -5 */ "unknown option -%c\n",
/* E_OPTARG -6 */ "missing option argument\n",
/* E_ARGCNT -7 */ "wrong number of arguments\n",
/* E_SUPP -8 */ "invalid minimal support %d\n",
/* E_NOTAS -9 */ "no items or transactions to work on\n",
/* -10 to -15 */ NULL, NULL, NULL, NULL, NULL, NULL,
/* E_ITEMEXP -16 */ "file %s, record %d: item expected\n",
/* E_DUPITEM -17 */ "file %s, record %d: plicate item %s\n",
/* E_UNKNOWN -18 */ "unknown error\n"
};
#endif

/*----------------------------------------------------------------------
Global Variables
----------------------------------------------------------------------*/
#ifndef QUIET
static char *prgname; /* program name for error messages */
#endif
static ITEMSET *itemset = NULL; /* item set */
static TASET *taset = NULL; /* transaction set */
static TATREE *tatree = NULL; /* transaction tree */
static ISTREE *istree = NULL; /* item set tree */
static FILE *in = NULL; /* input file */
static FILE *out = NULL; /* output file */

extern "C" TATREE * apriori( char*fn_in, char*fn_out, int supp, int & level,
Trie * bdPapriori, Trie * bdn, set<Element> * relist, double ratioNfC, double & eps, int ismax,
vector< unsigned int > * stat, int & maxBdP, bool & generatedFk, bool verbose ) ;

#endif

.c
============================================
/*----------------------------------------------------------------------
File : apriori.c
Contents: apriori algorithm for finding frequent item sets
(specialized version for FIMI 2003 workshop)
Author : Christian Borgelt
History : 15.08.2003 file created from normal apriori.c
16.08.2003 parameter for transaction filtering added
18.08.2003 dynamic filtering decision based on times added
21.08.2003 transaction sort changed to heapsort
20.09.2003 output file made optional
----------------------------------------------------------------------*/
/*
Modified by : Frédéric Flouvat
Modifications : store the positive and negative border into an
an input trie for ABS
process stastical informations on dataset to stop
the apriori classical iterations
Author : Frédéric Flouvat
----------------------------------------------------------------------*/

#include "apriori.h"

/*----------------------------------------------------------------------
Main Functions
----------------------------------------------------------------------*/

static void error (int code, ...)
{ /* --- print an error message */
#ifndef QUIET /* if not quiet version */
va_list args; /* list of variable arguments */
const char *msg; /* error message */

assert(prgname); /* check the program name */
if (code < E_UNKNOWN) code = E_UNKNOWN;
if (code < 0) { /* if to report an error, */
msg = errmsgs[-code]; /* get the error message */
if (!msg) msg = errmsgs[-E_UNKNOWN];
fprintf(stderr, "\n%s: ", prgname);
va_start(args, code); /* get variable arguments */
vfprintf(stderr, msg, args);/* print error message */
va_end(args); /* end argument evaluation */
}
#endif
#ifndef NDEBUG /* if debug version */
if (istree) ist_delete(istree);
if (tatree) tat_delete(tatree);
if (taset) tas_delete(taset, 0);
if (itemset) is_delete(itemset);
if (in) fclose(in); /* clean up memory */
if (out) fclose(out); /* and close files */
#endif
exit(code); /* abort the program */
} /* error() */

/*--------------------------------------------------------------------*/

TATREE * apriori( char*fn_in, char*fn_out, int supp, int & level, Trie * bdPapriori,
Trie * bdn , set<Element> * relist , double ratioNfC, double & eps,int ismax,
vector< unsigned int > * stat, int & maxBdP, bool & generatedFk, bool verbose )
{
int i, k, n; /* loop variables, counters */
int tacnt = 0; /* number of transactions */
int max = 0; /* maximum transaction size */
int empty = 1; /* number of empty item sets */
int *map, *set; /* identifier map, item set */
char *usage; /* flag vector for item usage */
clock_t t, tt, tc, x; /* timer for measurements */

double actNfC = 1 ;
double avgNfC = 0 ;
int nbgen = 0 ;
int nbfreq = 0 ;
level = 1 ;
bool endApriori = false ; // boolean to stop the initial classial apriori approach
int bdnsize = 0 ; // number of itemsets found infrequent

/* --- create item set and transaction set --- */
itemset = is_create(); /* create an item set and */
if (!itemset) error(E_NOMEM); /* set the special characters */
taset = tas_create(itemset); /* create a transaction set */
if (!taset) error(E_NOMEM); /* to store the transactions */
if( verbose ) MSG(fprintf(stderr, "\n")); /* terminate the startup message */

/* --- read transactions --- */
if( verbose )MSG(fprintf(stderr, "reading %s ... ", fn_in));
t = clock(); /* start the timer and */
in = fopen(fn_in, "r"); /* open the input file */
if (!in) error(E_FOPEN, fn_in);
for (tacnt = 0; 1; tacnt++) { /* transaction read loop */
k = is_read(itemset, in); /* read the next transaction */
if (k < 0) error(k, fn_in, RECCNT(itemset), BUFFER(itemset));
if (k > 0) break; /* check for error and end of file */
k = is_tsize(itemset); /* update the maximal */
if (k > max) max = k; /* transaction size */
if (taset && (tas_add(taset, NULL, 0) != 0))
error(E_NOMEM); /* add the loaded transaction */
} /* to the transaction set */
fclose(in); in = NULL; /* close the input file */
n = is_cnt(itemset); /* get the number of items */
if( verbose ) MSG(fprintf(stderr, "[%d item(s),", n));
if( verbose ) MSG(fprintf(stderr, " %d transaction(s)] done ", tacnt));
if( verbose ) MSG(fprintf(stderr, "[%.2fs].\n", SEC_SINCE(t)));

/* --- sort and recode items --- */
if( verbose ) MSG(fprintf(stderr, "sorting and recoding items ... "));
t = clock(); /* start the timer */
map = (int*)malloc(is_cnt(itemset) *sizeof(int));
if (!map) error(E_NOMEM); /* create an item identifier map */
n = is_recode(itemset, supp, 2, map); /* 2: sorting mode */
tas_recode(taset, map, n); /* recode the loaded transactions */
max = tas_max(taset); /* get the new maximal t.a. size */

// use in the other part of the implementation to have the corresponding
// identifiant to an internal id
stat->reserve( n+2 ) ;
stat->push_back( 0 ) ;
for(int j= 0; j< n ; j++ )
{
stat->push_back( 0 ) ;
relist->insert( Element( atoi( is_name( itemset, j ) ) ,j) );
}

if( verbose ) MSG(fprintf(stderr, "[%d item(s)] ", n));
if( verbose ) MSG(fprintf(stderr, "done [%.2fs].\n", SEC_SINCE(t)));

/* --- create a transaction tree --- */
if( verbose ) MSG(fprintf(stderr, "creating transaction tree ... "));
t = clock(); /* start the timer */
tatree = tat_create(taset,1); /* create a transaction tree */
if (!tatree) error(E_NOMEM); /* (compactify transactions) */
tt = clock() -t; /* note the construction time */
if( verbose ) MSG(fprintf(stderr, "done [%.2fs].\n", SEC_SINCE(t)));

/* --- create an item set tree --- */
if( verbose ) MSG(fprintf(stderr, "checking subsets of size 1"));
t = clock(); tc = 0; /* start the timer and */
istree = ist_create(n, supp); /* create an item set tree */
if (!istree) error(E_NOMEM);
for (k = n; --k >= 0; ) /* set single item frequencies */
ist_setcnt(istree, k, is_getfrq(itemset, k));
ist_settac(istree, tacnt); /* set the number of transactions */
usage = (char*)malloc(n *sizeof(char));
if (!usage) error(E_NOMEM); /* create a item usage vector */

/* --- check item subsets --- */
while (ist_height(istree) < max && ( ( ismax == -1 && endApriori == false )
|| ist_height(istree) < ismax )
)
{
nbgen = 0 ;
nbfreq = 0 ;

level ++ ;

i = ist_check(istree,usage);/* check current item usage */

if (i < max) max = i; /* update the maximum set size */
if (ist_height(istree) >= i) break;

k = ist_addlvl(istree, nbgen); /* while max. height is not reached, */

if (k < 0) error(E_NOMEM); /* add a level to the item set tree */
if (k != 0) break; /* if no level was added, abort */
if( verbose ) MSG(fprintf(stderr, " %d", ist_height(istree)));
if ((i < n) /* check item usage on current level */
&& (i *(double)tt < 0.1 *n *tc)) {
n = i; x = clock(); /* if items were removed and */
tas_filter(taset, usage); /* the counting time is long enough, */
tat_delete(tatree); /* remove unnecessary items */
tatree = tat_create(taset, 1);
if (!tatree) error(E_NOMEM);
tt = clock() -x; /* rebuild the transaction tree and */
} /* note the new construction time */
x = clock(); /* start the timer */

ist_countx(istree, tatree, nbfreq, istree->supp ); /* count the transaction tree */

tc = clock() -x; /* in the item set tree */

actNfC = 1-double(nbfreq)/double(nbgen) ;
avgNfC = avgNfC + actNfC ;

if( verbose )
{
cout<<" \t Fk : "<<nbfreq<<" Ck : "<<nbgen<<" NFk/Ck "<<actNfC<<" avg NFk/Ck "<<avgNfC/(level-1)<<endl;
}

bdnsize += nbgen - nbfreq ;

if( level >=4 && ( bdnsize / nbgen < 1.5 ) && ( bdnsize > 100 ) )
{
if( actNfC < ratioNfC )
{
eps = 0 ;
endApriori = true ;
}
else if( actNfC > 0.25 )
endApriori = true ;

}

} /* and note the new counting time */
if( verbose ) MSG(fprintf(stderr, " done [%.2fs].\n", SEC_SINCE(t)));

/* --- filter item sets --- */
t = clock(); /* start the timer */
#ifdef MAXIMAL /* filter maximal item sets */
if( verbose ) MSG(fprintf(stderr, "filtering maximal item sets ... "));

if( ratioNfC == 0 || nbgen < k+1 || ist_height(istree)>= max )
ist_filter2(istree, IST_MAXFRQ, 0);
else
ist_filter2(istree, IST_MAXFRQ, bdn);

if( verbose ) MSG(fprintf(stderr, " done [%.2fs].\n", SEC_SINCE(t)));
empty = (n <= 0) ? 1 : 0; /* check whether the empty item set */
#endif /* is maximal */
#ifdef CLOSED /* filter closed item sets */
if( verbose ) MSG(fprintf(stderr, "filtering closed item sets ... "));
ist_filter(istree, IST_CLOSED);
if( verbose ) MSG(fprintf(stderr, " done [%.2fs].\n", SEC_SINCE(t)));
for (k = n; --k >= 0; ) /* check for an item in all t.a. */
if (is_getfrq(itemset, k) == tacnt) break;
empty = (k <= 0) ? 1 : 0; /* check whether the empty item set */
#endif /* is closed */

/* --- print item sets --- */
for (i = ist_height(istree); --i >= 0; )
map[i] = 0; /* clear the item set counters */
if( verbose ) MSG(fprintf(stderr, "writing %s ... ", (fn_out) ? fn_out : "<none>"));
t = clock(); /* start the timer and */
if (fn_out) { /* if an output file is given, */
out = fopen(fn_out, "w"); /* open the output file */
if (!out) error(E_FOPEN, fn_out);
if (empty) fprintf(out, " (%d)\n", tacnt);
} /* report empty item set */
ist_init(istree); /* init. the item set extraction */
set = is_tract(itemset); /* get the transaction buffer */
for (n = empty; 1; n++) { /* extract item sets from the tree */

k = ist_set(istree, set, &supp);

if (k <= 0) break; /* get the next frequent item set */
map[k-1]++; /* count the item set */
if (fn_out) { /* if an output file is given */
for (i = 0; i < k; i++) { /* traverse the items */
fputs(is_name(itemset, set[i]), out);
fputc(' ', out); /* print the name of the next item */
} /* followed by a separator */
fprintf(out, "(%d)\n", supp);
} /* print the item set's support */
else
{
short unsigned * is = new short unsigned[k] ;

for (i = 0; i < k; i++) /* traverse the items */
{
is[i] = set[i] ;
}
if( k < level || nbgen < k+1 || ist_height(istree)>= max )
{
bdPapriori->insert(is, k ,supp ) ;

(*stat)[ 0 ] ++;
(*stat)[ k+1 ]++;

if( maxBdP < k )
maxBdP = k ;

}
else
{
generatedFk = true ;

}

delete[] is;

}
}
if (fn_out) { /* if an output file is given */
if (fflush(out) != 0) error(E_FWRITE, fn_out);
if (out != stdout) fclose(out);
out = NULL; /* close the output file */
}
if( verbose ) MSG(fprintf(stderr, "[%d set(s)] done ", n));
if( verbose ) MSG(fprintf(stderr, "[%.2fs].\n", SEC_SINCE(t)));

/* --- print item set statistics --- */
k = ist_height(istree); /* find last nonzero counter */
if ((k > 0) && (map[k-1] <= 0)) k--;
if( verbose ){
printf("%d\n", empty); /* print the numbers of item sets */
for (i = 0; i < k; i++) printf("%d\n", map[i]);
}

/* --- clean up --- */
#ifndef NDEBUG /* if this is a debug version */
free(usage); /* delete the item usage vector */
free(map); /* and the identifier map */
ist_delete(istree); /* delete the item set tree, */

if (taset) tas_delete(taset, 0); /* the transaction set, */
is_delete(itemset); /* and the item set */
#endif

return tatree ;

}

G. 設c是apriori演算法產生的ck中的一個候選項集.在剪枝步,需要檢查多少個長度為

2 FAA演算法思想
2.1 鏈表數組定義及生成演算法。鏈表數組定義:數組為n個指針的一維數組P[n],對應資料庫中的頻繁項I1,I2,…,In,對應數組長度n為資料庫中頻繁項的數量。結點為事務結點,分為事務域、計數域和指針域。事務域是以頻繁項為後綴的事務編碼。計數域是該事務編碼的數量,指針域是指向下一結點的指針。
編碼方法:設資料庫中有n個頻繁項I1,I2,…,In。事務t的編碼就是長度為n的0、1位串。在t中出現的項,其相應位置用1表示,否則填0。例如,有四個頻繁項a,b,c,d。那麼,一個包含a和c的事務就被映射為1010。
鏈表數組的構造過程如下:(1)掃描事務資料庫,產生所有頻繁1-項集及支持度計數,依據支持度計數降序排列,生成FI-List。(2)再次掃描資料庫,將每條記錄中不滿足最小支持度計數的項刪除,並將剩餘項按照FI-List重新排序。設形成的新序列為{m1,m2,…,mn},依次取出序列中的前k(1≤k≤n)項組成子序列{m1,m2,…,mk},對每個子序列進行編碼並建立一個與之對應的事務結點,並按照子序列中最後一項追加到P[n]中相應鏈上。

H. python哪個包實現apriori

要用apriori不需要哪個包,要有一個實現apriori功能的.py文件,將這個文件放置在你要調用的文件相同的地址,然後用fromaprioriimport*來使用。。

apriori.py下載地址:鏈接:https://pan..com/s/1XpKHfUXGDIXj7CcYJL0anA

密碼:aug8

I. 怎麼用java實現apriori演算法

fromoperatorimportand_:
def__init__(self,inputfile):
self.transactions=[]
self.itemSet=set([])
inf=open(inputfile,'rb')
forlineininf.readlines():
elements=set(filter(lambdaentry:len(entry)>0,line.strip().split(',')))
iflen(elements)>0:
self.transactions.append(elements)
forelementinelements:
self.itemSet.add(element)
inf.close()
self.toRetItems={}
self.associationRules=[]

defgetSupport(self,itemcomb):
iftype(itemcomb)!=frozenset:
itemcomb=frozenset([itemcomb])
within_transaction=lambdatransaction:rece(and_,[(itemintransaction)foriteminitemcomb])
count=len(filter(within_transaction,self.transactions))
returnfloat(count)/float(len(self.transactions))

defrunApriori(self,minSupport=0.15,minConfidence=0.6):
itemCombSupports=filter(lambdafreqpair:freqpair[1]>=minSupport,
map(lambdaitem:(frozenset([item]),self.getSupport(item)),self.itemSet))
currentLset=set(map(lambdafreqpair:freqpair[0],itemCombSupports))
k=2
whilelen(currentLset)>0:
currentCset=set([i.union(j)(i.union(j))==k])
currentItemCombSupports=filter(lambdafreqpair:freqpair[1]>=minSupport,
map(lambdaitem:(item,self.getSupport(item)),currentCset))
currentLset=set(map(lambdafreqpair:freqpair[0],currentItemCombSupports))
itemCombSupports.extend(currentItemCombSupports)
k+=1
forkey,supportValinitemCombSupports:
self.toRetItems[key]=supportVal
self.calculateAssociationRules(minConfidence=minConfidence)

defcalculateAssociationRules(self,minConfidence=0.6):
forkeyinself.toRetItems:
subsets=[frozenset(item)forkinrange(1,len(key))foritemincombinations(key,k)]
forsubsetinsubsets:
confidence=self.toRetItems[key]/self.toRetItems[subset]
ifconfidence>minConfidence:
self.associationRules.append([subset,key-subset,confidence])
用Scala也大概六十多行:
importscala.io.Sourceimportscala.collection.immutable.Listimportscala.collection.immutable.Setimportjava.io.Fileimportscala.collection.mutable.MapclassAprioriAlgorithm(inputFile:File){
vartransactions:List[Set[String]]=List()
varitemSet:Set[String]=Set()
for(line<-Source.fromFile(inputFile).getLines()){
valelementSet=line.trim.split(',').toSet
if(elementSet.size>0){
transactions=transactions:+elementSet
itemSet=itemSet++elementSet
}
}
vartoRetItems:Map[Set[String],Double]=Map()
varassociationRules:List[(Set[String],Set[String],Double)]=List()

defgetSupport(itemComb:Set[String]):Double={
defwithinTransaction(transaction:Set[String]):Boolean=itemComb
.map(x=>transaction.contains(x))
.receRight((x1,x2)=>x1&&x2)
valcount=transactions.filter(withinTransaction).size
count.toDouble/transactions.size.toDouble
}

defrunApriori(minSupport:Double=0.15,minConfidence:Double=0.6)={
varitemCombs=itemSet.map(word=>(Set(word),getSupport(Set(word))))
.filter(wordSupportPair=>(wordSupportPair._2>minSupport))
varcurrentLSet:Set[Set[String]]=itemCombs.map(wordSupportPair=>wordSupportPair._1).toSet
vark:Int=2
while(currentLSet.size>0){
valcurrentCSet:Set[Set[String]]=currentLSet.map(wordSet=>currentLSet.map(wordSet1=>wordSet|wordSet1))
.receRight((set1,set2)=>set1|set2)
.filter(wordSet=>(wordSet.size==k))
valcurrentItemCombs=currentCSet.map(wordSet=>(wordSet,getSupport(wordSet)))
.filter(wordSupportPair=>(wordSupportPair._2>minSupport))
currentLSet=currentItemCombs.map(wordSupportPair=>wordSupportPair._1).toSet
itemCombs=itemCombs|currentItemCombs
k+=1
}
for(itemComb<-itemCombs){
toRetItems+=(itemComb._1->itemComb._2)
}
calculateAssociationRule(minConfidence)
}

defcalculateAssociationRule(minConfidence:Double=0.6)={
toRetItems.keys.foreach(item=>
item.subsets.filter(wordSet=>(wordSet.size<item.size&wordSet.size>0))
.foreach(subset=>{associationRules=associationRules:+(subset,itemdiffsubset,
toRetItems(item).toDouble/toRetItems(subset).toDouble)
}
)
)
associationRules=associationRules.filter(rule=>rule._3>minConfidence)
}}

我不建議用Java,應改用Python或Scala一類的語言。如果用Python,代碼大概50行左右,但可以想像用Java便看起來復雜得多。看如下:

J. 要實驗apriori演算法 用UCI裡面的哪些案例

你可以選用 Default Task 為Recommender-Systems 的那些數據集。

比如:Anonymous Microsoft Web Data Data Set

這個數據集是從1998年2月的某個星期的微軟官方網站的訪問日誌中取樣得到的。

包含了38000 個匿名用戶在一周內對 www.microsoft.com 站點的某些網頁的訪問記錄。

有人用這個數據集來做協同過濾的推薦系統,當然也可以應用apriori演算法。


我順便幫你把數據文件傳上來了。



Source:

Creators:

Jack S. Breese, David Heckerman, Carl M. Kadie
Microsoft Research, Redmond WA, 98052-6399, USA
breese'@'microsoft.com,heckerma'@'microsoft.com,carlk'@'microsoft.com

Donors:

Breese:, Heckerman, & Kadie


Data Set Information:

We created the data by sampling and processing the www.microsoft.com logs. The data records the use of www.microsoft.com by 38000 anonymous, randomly-selected users. For each user, the data lists all the areas of the web site (Vroots) that user visited in a one week timeframe.

Users are identified only by a sequential number, for example, User #14988, User #14989, etc. The file contains no personally identifiable information. The 294 Vroots are identified by their title (e.g. "NetShow for PowerPoint") and URL (e.g. "/stream"). The data comes from one week in February, 1998.


Attribute Information:

Each attribute is an area ("vroot") of the www.microsoft.com web site.

The datasets record which Vroots each user visited in a one-week timeframe in Feburary 1998.


Relevant Papers:

J. Breese, D. Heckerman., C. Kadie _Empirical Analysis of Predictive Algorithms for Collaborative Filtering_ Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July, 1998.
[Web Link]

Also, expanded as Microsoft Research Technical Report MSR-TR-98-12, The papers are available on-line at:[Web Link]


參考:

http://archive.ics.uci.e/ml/datasets/Anonymous+Microsoft+Web+Data

http://archive.ics.uci.e/ml/machine-learning-databases/anonymous/

下載地址

http://archive.ics.uci.e/ml/machine-learning-databases/anonymous/anonymous-msweb.data

詳細說明

http://archive.ics.uci.e/ml/machine-learning-databases/anonymous/anonymous-msweb.info


熱點內容
超凡先鋒配置不行怎麼辦 發布:2025-05-15 23:27:54 瀏覽:530
win7取消加密 發布:2025-05-15 23:26:37 瀏覽:470
不用internet打開ftp 發布:2025-05-15 23:06:00 瀏覽:153
sql字元串取數字 發布:2025-05-15 22:57:45 瀏覽:124
推薦編程課 發布:2025-05-15 22:34:12 瀏覽:618
表拒絕訪問 發布:2025-05-15 22:29:37 瀏覽:978
電腦怎樣解壓文件 發布:2025-05-15 22:25:32 瀏覽:439
dns伺服器怎麼看 發布:2025-05-15 22:17:27 瀏覽:151
3dm的壓縮包 發布:2025-05-15 22:09:23 瀏覽:662
和存儲字長 發布:2025-05-15 21:54:09 瀏覽:515