Fast Vertical Mining Using Diffsets Mohammed J Zaki
- Slides: 20
Fast Vertical Mining Using Diffsets Mohammed J. Zaki and Karam Gouda The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003 2021/2/21 報告人: 吳建良 1
Abstract n n n Vertical data format Diffset Incorporate into previous vertical mining methods Reduce memory size required to store intermediate results Increase performance 2
Notation I A set of items T Database of transactions tid Identifier of transaction itemset A set tidset A set k-itemset An itemset with k items σ(X) The support of an itemset X frequent itemset Its support ≧ min_sup Fk The set of frequent k-itemsets 3
Notation cont. n Powerset P(I) n n Maximal frequent itemset n n search space enumeration if it is not a subset of any other frequent itemset Closed frequent itemset (X) n if there is not exist a superset with 4
Example 5
Data Format 6
Lattice Decomposition: Prefix-Based Classes n n n Define an equivalence relation θk on the lattice P(I) where p(X, k)=X[1: k], the k length prefix of X θk : prefix-based equivalence relation Break the original search space into independent subproblem 7
Subset Search Tree {} {A, C, D, T, W} AD {TW} AC {D, T, W} ACD {T, W} C {D, T, W} A {C, D, T, W} AT {W} AW CD {T, W} ACT {W} ACW ADT ADW ATW ACDT {W} ACDW ACTW ADTW CDT {W} CDW CTW CW DT {W} DW W TW DTW CDTW ACDTW 8
Tidset for Pattern Counting 9
Diffset n n Difference of the prefix tidset and a class member tidset Consider class with prefix P n n Let t(X) denote the tidset of element X Let d(X) denote the diffset of element X, with respect to prefix tidset Let PX and PY be class members of P Support 10
Diffset cont. n Define diffset: Then n How to Calculate n using d(PX) and d(PY) ? ․ ․ 11
Diffset cont. t(X) t(P) t(Y ) d(PY) d(PXY) t(PXY) 12
Diffset Example n Diffset calculation n Support calculation n 13
Diffset Intersection Example 14
Diffset Example n Total Size n n n Tidsets database size =76 tids Diffsets database size =22 tids Size By Length K-itemset (k) 2 3 4 Avg. tidset length 3. 8 3. 2 3 Avg. diffset length 1 0. 6 0 15
d. Eclat: Diffset Based Mining 16
Experimental Results Average diffset / tidset size by length 17
Experimental Results cont. Database Min_sup (%) # Items #Records Max Length Avg. Diffset Size Avg. Tidset Size Reduction Ration chess 0. 5 76 3196 16 26 1820 70 connect 90 130 67557 12 143 62204 435 mushroom 5 120 8124 17 60 622 10 Pumsb* 35 7117 49046 15 301 18977 63 pumsb 90 7117 49046 8 330 45036 136 T 10 I 4 D 100 K 0. 025 100000 11 14 86 6 T 20 I 6 D 100 K 0. 1 100000 14 31 230 11 T 40 I 10 D 100 K 0. 5 100000 18 96 755 8 18
Experimental Results cont. 19
Experimental Results cont. 20
- Fast vertical mining using diffsets
- Frequent itemset mining methods
- Strip mining vs open pit mining
- Strip mining before and after
- Difference between strip mining and open pit mining
- Web text mining
- Mining multimedia databases in data mining
- Mining complex types of data in data mining
- Fast algorithms for mining association rules
- Fast algorithms for mining association rules
- Fast algorithms for mining association rules
- Fast algorithms for mining association rules
- Example of acid-fast bacteria
- Example of acid-fast bacteria
- Korgazmali metod
- Klaster metodi nima
- Ta'lim metodlari
- Zaki yasin architect
- Zaki akel sobrinho
- Zaki suud
- Benderdour zaki