Frequent itemset SupportX itemX Frequent itemset minsupitemset EX

  • Slides: 37
Download presentation

Frequent itemset Support(X) = item(X)出現的交易筆數/資料庫中的總交 易筆數 Frequent itemset = 滿足min_sup的itemset EX Min_sup = 60%

Frequent itemset Support(X) = item(X)出現的交易筆數/資料庫中的總交 易筆數 Frequent itemset = 滿足min_sup的itemset EX Min_sup = 60% Sup(a)= 3/5 = 60% a = frequent itemset TID Items 10 a, c, d, e, f 20 a, b, e 30 c, e, f 40 a, c, d, f 50 c, e, f

Frequent closed itemset 沒有被其他support相同frequent itemset的frequent itemset Ex Min_sup = 40% 1 Freq. Items Sup

Frequent closed itemset 沒有被其他support相同frequent itemset的frequent itemset Ex Min_sup = 40% 1 Freq. Items Sup 2 Freq. Items Sup ac 2 a 3 ad 2 c 4 ae 2 d 2 cd 2 e 4 ce 3 3 -Freq -Items Su p acd 2 TID Items 10 a, c, d, e 20 a, b, e 30 c, e 40 a, c, d 50 c, e

相關研究 1. J. Wang , J. Han , J. Pei. CLOSET+: searching for the

相關研究 1. J. Wang , J. Han , J. Pei. CLOSET+: searching for the bestt strategies for mining frequent closed Itemsets[C]//Proc of Association for Computing Machinery‘s Special Interest Group on Knowledge Discovery and Data Mining ’ 03, 2003: 236 -245. 2. J. Pei , J. Han , R. Mao CLOSET: an efficient algorithm for mining frequent closed itemsets[C]// Data Mining and Knowledge Discovery ’ 00, 2000: 11 -20 3. Gang Fang , Yue Wu , Ming Li and Jia Chen “An Efficient Algorithm for Mining Frequent Closed Itemsets “An International Journal of Computing and Informatics 39(2015)87 -98 4. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487– 499, Santiago, Chile, September 1994. 5. J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, In Proc. Of the ACM-SIGMOD Intel. Conf. on Management of Data, pp. 1 -12, 2000. 6. 劉信義,使用群聚壓縮樹之高效率關聯法則挖掘法,碩士論文,2004

Closet+ Ø 建立FP-tree root item lin k f 4 c 4 f, c, a,

Closet+ Ø 建立FP-tree root item lin k f 4 c 4 f, c, a, m, p a 3 a, c, d, f, m, p f, c, a, m, p b 3 3 a, b, c, f, g, m f, c, a, b, m m 3 4 b, f, I f, b p 3 5 b, c, n, f c, b, p TID items 轉換完後的 items 1 a, c, f, m, p 2 F_list: <f: 4, c: 4, a: 3, b: 3, m: 3, p: 3> f: 4 c: 3 c: 1 b: 1 a: 3 b: 1 p: 1 m: 2 b: 1 p: 2 m: 1

p: 3的投影樹 F_list: <f: 4, c: 4, a: 3, b: 3, m: 3, p:

p: 3的投影樹 F_list: <f: 4, c: 4, a: 3, b: 3, m: 3, p: 3> 從f_list中的最後一個item開始mining,建立 p的投影樹 root f 4 c 4 a 3 b 3 m 3 p 3 c: 1 f: 4 c: 3 b: 1 a: 3 b: 1 p: 1 m: 2 b: 1 p: 2 m: 1 c 3 f 2 a 2 m 2 Min_sup : 2 root c: 3 f: 2 a: 2 m: 2

pm: 2的投影樹 Min_sup : 2 frequent close itemset: fcamp: 2 root c 3 f

pm: 2的投影樹 Min_sup : 2 frequent close itemset: fcamp: 2 root c 3 f 2 a 2 m 2 c: 3 c 2 f 2 a 2 … item 4 … m: 2 c: 2 sup f: 2 a: 2 root 3 f: 2 a: 2 root f: 2 c sup 2 … … … a 4 … 3 m: 2 2 p: 2 … a: 2

pc: 3的投影樹 Frequent close itemset: fcamp: 2 cp: 3 c 3 f 2 a

pc: 3的投影樹 Frequent close itemset: fcamp: 2 cp: 3 c 3 f 2 a 2 m 2 Min_sup : 2 root c c: 3 f: 2 a: 2 m: 2 3 root c: 3 root item … c … … p sup … 4 3 2 … f: 3 c: 2 p: 3 a: 2 m: 2 p: 2

m: 3 投影樹 Frequent close itemset: fcamp: 2 cp: 3 root fcam: 3 f

m: 3 投影樹 Frequent close itemset: fcamp: 2 cp: 3 root fcam: 3 f 3 c 3 a 3 f: 3 c: 3 Min_sup : 2 root item … … … a: 3 … m sup … 4 3 2 … f: 3 c: 2 p: 3 a: 2 a: 3 m: 2 m: 3 p: 2

b: 3 投影樹 Frequent close itemset: fcamp: 2 cp: 3 fcam: 3 fb: 2

b: 3 投影樹 Frequent close itemset: fcamp: 2 cp: 3 fcam: 3 fb: 2 cb: 2 b: 3 root f 2 c 2 bc: 2 投影樹 f … root … f: 2 c: 1 Min_sup : 2 item … sup … … f: 3 b 4 c: 3 c: 2 3 c: 1 2 … a: 2 a: 3 m: 2 m: 3 f: 1 1 p: 2 bf: 2 投影樹 root f f: 2 2 c: 3 b: 2 p: 3 b: 2

a: 3 投影樹 Frequent close itemset: cfamp: 2 pc: 3 cfam: 3 cb: 2

a: 3 投影樹 Frequent close itemset: cfamp: 2 pc: 3 cfam: 3 cb: 2 fc: 2 b: 3 root f 3 c 3 f: 3 Min_sup : 2 item … root … … sup … … f: 3 a 4 c: 3 c: 2 3 2 c: 3 … a: 2 a: 3 m: 2 m: 3 p: 2 c: 3 b: 2 p: 3 b: 2

c: 4 投影樹跟 f: 4 投影樹 Frequent close itemset: cfamp: 2 pc: 3 cfam:

c: 4 投影樹跟 f: 4 投影樹 Frequent close itemset: cfamp: 2 pc: 3 cfam: 3 cb: 2 fc: 2 b: 3 c: 4 f: 4 c: 4 投影樹 f 3 root f: 3 Min_sup : 2 item … root … … sup … … f: 4 f: 3 c 4 c: 3 c: 2 3 2 f: 4 投影樹 root f f: 4 4 … a: 2 a: 3 m: 2 m: 3 p: 2 c: 3 c: 4 b: 2 p: 3 b: 2

Hf

Hf

 找到cfamp: 2為frequent close itemset a 3 m 3 因為到從節點c~節點m為止出現 的次數都為 3因此fcam: 3也是 frequent

找到cfamp: 2為frequent close itemset a 3 m 3 因為到從節點c~節點m為止出現 的次數都為 3因此fcam: 3也是 frequent close itemset 最後到f: 4,也是frequent close itemset Frequent close itemset: cfamp: 2 cfam: 3 f: 4 p 2 x Hfc: 3 =Hfcam: 3 root c: 1 f: 4 c: 3 b: 1 a: 3 b: 1 p: 1 m: 2 b: 1 p: 2 m: 1

Hc f c a b m p 4 4 3 3 x x x

Hc f c a b m p 4 4 3 3 x x x Frequent close itemset: cfamp: 2 cfam: 3 f: 4 pc: 3, cb: 2, c: 4 root item c: 1 f: 4 … … c: 3 b: 1 … a: 3 m: 2 p: 2 … b: 1 m: 1 p: 1 c sup … 4 3 f: 4 c: 3 c: 2 p: 3 a: 2 a: 3 m: 2 m: 3 2 … p: 2 b: 3

Ha f 4 Frequent close itemset: cfamp: 2 cfam: 3 f: 4 pc: 3,

Ha f 4 Frequent close itemset: cfamp: 2 cfam: 3 f: 4 pc: 3, c: 4, cb: 2 c 4 a 3 b 3 m 3 x root item … f: 4 … a c: 4 c: 3 … … c: 1 f: 4 sup … 4 3 c: 2 a: 3 m: 2 m: 3 2 … p: 2 b: 1 p: 3 a: 3 b: 1 b: 3 m: 2 b: 1 p: 2 m: 1 p 3 x

Hb f 4 c 4 a 3 b 3 m 3 Frequent close itemset:

Hb f 4 c 4 a 3 b 3 m 3 Frequent close itemset: cfamp: 2 cfam: 3 f: 4 pc: 3, cb: 2, c: 4 b: 3, fb: 2 p 3 x root item f: 4 c: 3 c: 1 … … b: 1 … a: 3 m: 2 p: 2 b: 1 m: 1 … c p: 1 b: 3 b: 2 sup … 4 3 f: 4 c: 3 c: 2 p: 3 a: 2 a: 3 m: 2 m: 3 2 … p: 2 b: 2

 建立左歪斜樹 1 S 0 C 1 Level 0 2 S 1 C 2

建立左歪斜樹 1 S 0 C 1 Level 0 2 S 1 C 2 Level 1 3 S 2 C 3 Level 2 4 S 3 C 4 Level 3 5 S 4 C 5 Level 4 start level Count

資料庫 TID Items 1 1 1 2 2 10 1, 2, 4, 5 20

資料庫 TID Items 1 1 1 2 2 10 1, 2, 4, 5 20 1, 2, 3, 4 2 30 2, 3, 4, 5 3 4 40 1, 2, 3 4 5 50 1, 2, 3 60 1, 5 0 0 1 1 加入TID 40, TID 50 4 5 0 1 2 0 2 3 4 0 1 4 5 0 0 2 0 1 1 4 4 5 5 0 4 5 1 4 1 1 5 5 3 3 0 1 2 5 4 加入TID 60 1 1 3 5 5 1 加入 TID 30 加入TID 20 加入TID 10 0 1 1 0 1

 建構完成 1 B 5 2 C 5 3 A 4 4 D 3

建構完成 1 B 5 2 C 5 3 A 4 4 D 3 5 E 3 index item count link 1 2 5 0 2 3 4 0 1 4 5 1 1 5 0 1

GC-tree mining frequent itemset 尋找frequent itemset前,先對節點做紀錄表格的累加 1 1 B 5 2 C 5 3

GC-tree mining frequent itemset 尋找frequent itemset前,先對節點做紀錄表格的累加 1 1 B 5 2 C 5 3 A 4 4 D 3 5 E 3 index item count 2 link 0 5 0 4 1 1 5 0 3 0 1 2 1 3 4 0 1 4 5 0 1 1 1 5 1 1 0 1

 從最後一個節點 5開始找 Min_sup: 3 1 B 5 2 C 5 3 A 4

從最後一個節點 5開始找 Min_sup: 3 1 B 5 2 C 5 3 A 4 5 1 0 4 1 1 4 0 3 D 3 1 1 E 3 0 1 1 Frequent itemset: 5: 3 0 5 2 1 5 3 4 4 5 5 0 0 1 1 0 1

 節點 4 Min_sup: 3 1 B 5 2 C 5 3 A 4

節點 4 Min_sup: 3 1 B 5 2 C 5 3 A 4 5 1 0 4 1 1 4 0 3 D 3 1 1 E 3 0 1 1 0 5 2 3 1 4 5 5 Frequent itemset: 5: 3 , 4: 3 0 0 E 1 4 1 1 0 2 1 1 5 5 0 1

 節點 3 Min_sup: 3 Frequent itemset: 5: 3 , 4: 3 , 3:

節點 3 Min_sup: 3 Frequent itemset: 5: 3 , 4: 3 , 3: 4

Min_sup: 3 節點 2 Frequent itemset: 5: 3 , 4: 3 , 3: 4

Min_sup: 3 節點 2 Frequent itemset: 5: 3 , 4: 3 , 3: 4 , 2: 5 24: 3 , 23: 4

 節點 1 Frequent itemset: 5: 3 , 4: 3 , 3: 4 ,

節點 1 Frequent itemset: 5: 3 , 4: 3 , 3: 4 , 2: 5 24: 3 , 23: 4 , 13: 3 , 12: 4 , 123: 3 , 1: 5 Min_sup: 3

GC-tree mining closed 找frequent close itemset Min_sup: 2 1 1 B 9 2 C

GC-tree mining closed 找frequent close itemset Min_sup: 2 1 1 B 9 2 C 9 3 A 8 0 2 4 D 6 3 4 5 E 5 0 6 F 5 7 G 2 1 5 0 2 3 4 6 0 1 1 4 5 1 5 6 0 2 7 0 2

Min_sup: 2 從項目 1開始建立投影樹 1 B 9 2 C 8 3 A 3 4

Min_sup: 2 從項目 1開始建立投影樹 1 B 9 2 C 8 3 A 3 4 D 5 5 E 5 6 F 5 7 G 2 1 2 0 3 3 3 0 8 2 4 0 4 3 5 0 2 2 6 0 1 3 root 0 1 4 4 5 0 2 4 6 0 2 5 6 0 2 4 7 0 2 5 1: 9, 1 1: 3, 1 2: 3, 2 2: 8, 2 Frequent close itemset: 123: 3 , 124: 3 1246: 4 12456: 2 , 12467: 2 12: 8 , 14: 7 , 15: 5 , 16: 5 3: 3, 3 5: 5, 2 4: 7, 2 4: 2, 3 4: 4, 3 6: 5, 2 6: 4, 4 5: 2, 4 6: 2, 5 7: 2, 5

Min_sup: 2 項目 2的投影樹 2 C 9 3 A 4 4 D 6 5

Min_sup: 2 項目 2的投影樹 2 C 9 3 A 4 4 D 6 5 E 3 6 F 4 7 G 2 2 1 4 2 3 4 1 2 3 4 5 1 1 4 5 6 Frequent close itemset: 123: 3 124: 3 126: 4 12456: 2 , 12467: 2 12: 8 , 14: 7 , 15: 3 , 16: 5 234: 2 , 245: 3, 23: 4 , 24: 6 1 4 2 1 2 3 6 1 2 4 7 1 2 3 root 1: 9, 1 2: 8, 2 3: 3, 3 6: 4, 4 4: 4, 3 4: 7, 2 5: 5, 2 6: 5, 2 5: 2, 4 6: 2, 5 7: 2, 5 2: 9, 2 2: 2, 1 2: 4, 2 2: 6, 2 3: 2, 2 3: 4, 2 4: 2, 3 4: 3, 2 4: 6, 2 5: 3, 2

 項目 3的投影樹 3 A 8 4 D 8 5 E 4 Min_sup: 2

項目 3的投影樹 3 A 8 4 D 8 5 E 4 Min_sup: 2 3 3 6 1 4 3 1 2 5 Frequent close itemset: 123: 3 124: 3 126: 4 12456: 2 , 12467: 2 12: 8 , 14: 7 , 15: 3 , 16: 5 234: 2 , 245: 3, 3 23: 4 , 24: 6 34: 8, 35: 4 6: 4, 4 3 1 3 root 1: 9, 1 2: 8, 2 4: 4, 3 3: 8, 1 4: 7, 2 5: 5, 2 6: 5, 2 5: 2, 4 6: 2, 5 7: 2, 5 4: 8, 2 5: 4, 2 2: 9, 2 2: 2, 1 2: 4, 2 2: 6, 2 3: 2, 2 3: 4, 2 4: 2, 3 4: 3, 2 4: 6, 2 5: 3, 3 5: 3, 2

未來 作 改善搜尋樹的結構 設定終止條件 與Closet+做實驗比較佔用空間及執行時間 研讀 Gang Fang , Yue Wu , Ming Li

未來 作 改善搜尋樹的結構 設定終止條件 與Closet+做實驗比較佔用空間及執行時間 研讀 Gang Fang , Yue Wu , Ming Li and Jia Chen “An Efficient Algorithm for Mining Frequent Closed Itemsets “An International Journal of Computing and Informatics 39(2015)87 -98