Association Rule Mining Max Miner Mining Association Rules

  • Slides: 17
Download presentation
Association Rule Mining - Max. Miner

Association Rule Mining - Max. Miner

Mining Association Rules in Large Databases p Association rule mining p Algorithms Apriori and

Mining Association Rules in Large Databases p Association rule mining p Algorithms Apriori and FP-Growth p Max and closed patterns p Mining various kinds of association/correlation rules

Max-patterns & Close-patterns If there are frequent patterns with many items, enumerating all of

Max-patterns & Close-patterns If there are frequent patterns with many items, enumerating all of them is costly. p We may be interested in finding the ‘boundary’ frequent patterns. p Two types… p

Max-patterns Frequent pattern {a 1, …, a 100} (1001) + (1002) + … +

Max-patterns Frequent pattern {a 1, …, a 100} (1001) + (1002) + … + (110000) = 2100 -1 = 1. 27*1030 frequent sub-patterns! p Max-pattern: frequent patterns without proper frequent super pattern p n n BCDE, ACD are max-patterns BCD is not a max-pattern Min_sup=2 Tid 10 20 30 Items A, B, C, D, E, A, C, D, F

Maximal Frequent Itemset An itemset is maximal frequent if none of its immediate supersets

Maximal Frequent Itemset An itemset is maximal frequent if none of its immediate supersets is frequent Maximal Itemsets Infrequent Itemsets Border

Closed Itemset p An itemset is closed if none of its immediate supersets has

Closed Itemset p An itemset is closed if none of its immediate supersets has the same support as the itemset

Maximal vs Closed Itemsets Transaction Ids Not supported by any transactions

Maximal vs Closed Itemsets Transaction Ids Not supported by any transactions

Itemsets Minimum support = 2 Closed but not maximal Closed and maximal # Closed

Itemsets Minimum support = 2 Closed but not maximal Closed and maximal # Closed = 9 # Maximal = 4

Maximal vs Closed Itemsets

Maximal vs Closed Itemsets

Max. Miner: Mining Max-patterns p Idea: generate the complete setenumeration tree one level at

Max. Miner: Mining Max-patterns p Idea: generate the complete setenumeration tree one level at a time, while prune if applicable. (ABCD) A (BCD) AB (CD) AC (D) AD () ABC (C) ABD () ACD () ABCD () BC (D) BD () BCD () C (D) CD ()

Local Pruning Techniques (e. g. at node A) Check the frequency of ABCD and

Local Pruning Techniques (e. g. at node A) Check the frequency of ABCD and AB, AC, AD. p If ABCD is frequent, prune the whole sub-tree. p If AC is NOT frequent, remove C from the parenthesis before expanding. (ABCD) A (BCD) AB (CD) AC (D) AD () ABC (C) ABD () ACD () ABCD () BC (D) BD () BCD () C (D) CD ()

Algorithm Max. Miner Initially, generate one node N= (ABCD) , where h(N)= and t(N)={A,

Algorithm Max. Miner Initially, generate one node N= (ABCD) , where h(N)= and t(N)={A, B, C, D}. p Consider expanding N, p n n p If h(N) t(N) is frequent, do not expand N. If for some i t(N), h(N) {i} is NOT frequent, remove i from t(N) before expanding N. Apply global pruning techniques…

Global Pruning Technique (across sub-trees) p When a max pattern is identified (e. g.

Global Pruning Technique (across sub-trees) p When a max pattern is identified (e. g. ABCD), prune all nodes (e. g. B, C and D) where h(N) t(N) is a sub-set of it (e. g. ABCD). (ABCD) A (BCD) AB (CD) AC (D) AD () ABC (C) ABD () ACD () ABCD () BC (D) BD () BCD () C (D) CD ()

Example (ABCDEF) A (BCDE) B (CDE) C (DE) Items Frequency ABCDEF 0 A 2

Example (ABCDEF) A (BCDE) B (CDE) C (DE) Items Frequency ABCDEF 0 A 2 B 2 C 3 D 3 E 2 F 1 D (E) E () Tid Items 10 A, B, C, D, E 20 B, C, D, E, 30 A, C, D, F Min_sup=2 Max patterns:

Example (ABCDEF) A (BCDE) B (CDE) C (DE) AC (D) AD () D (E)

Example (ABCDEF) A (BCDE) B (CDE) C (DE) AC (D) AD () D (E) E () Tid Items 10 A, B, C, D, E 20 B, C, D, E, 30 A, C, D, F Min_sup=2 Node A Items Frequen cy ABCDE 1 AB 1 AC 2 AD 2 AE 1 Max patterns:

Example (ABCDEF) A (BCDE) B (CDE) C (DE) AC (D) AD () D (E)

Example (ABCDEF) A (BCDE) B (CDE) C (DE) AC (D) AD () D (E) E () Items Frequency BCDE 2 BD BE Items 10 A, B, C, D, E 20 B, C, D, E, 30 A, C, D, F Min_sup=2 Node B BC Tid Max patterns: BCDE

Example (ABCDEF) A (BCDE) B (CDE) C (DE) AC (D) AD () D (E)

Example (ABCDEF) A (BCDE) B (CDE) C (DE) AC (D) AD () D (E) E () Tid Items 10 A, B, C, D, E 20 B, C, D, E, 30 A, C, D, F Min_sup=2 Node AC Items Frequen cy ACD 2 Max patterns: BCDE ACD