Mining Association Rules with Constraints Wei Ning Joon

  • Slides: 47
Download presentation
Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation 1

Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation 1

Outline n n n Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Outline n n n Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 2

Outline n n n Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Outline n n n Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 3

Introduction n n Recall mining association rules Association rules mining finds interesting association or

Introduction n n Recall mining association rules Association rules mining finds interesting association or correlationships among a large set of data items. 4

Some problems we met during mining association rules n n Overwhelming? Not what you

Some problems we met during mining association rules n n Overwhelming? Not what you want? Wait so long? Lack of Focus 5

Introduction(cont. ) n n Example in walmart Suppose a manager want to find which

Introduction(cont. ) n n Example in walmart Suppose a manager want to find which is the most popular shoes in winter? 6

Outline n Introduction n Summary of Approach n n Algorithm CAP Performance Analysis Conclusion

Outline n Introduction n Summary of Approach n n Algorithm CAP Performance Analysis Conclusion References 7

Mining frequent itemsets vs. Mining association rules n Mining frequent itemsets is almost the

Mining frequent itemsets vs. Mining association rules n Mining frequent itemsets is almost the same as Mining association rules 8

Constrained Mining n A naive solution n n First find all frequent sets, and

Constrained Mining n A naive solution n n First find all frequent sets, and then test them for constraint satisfaction Our approach: Analyze the properties of constraints comprehensively n. Push them as deeply as possible inside the frequent pattern computation. n 9

Frequent Itemsets & Constraints TDB (min_sup=2) TID Transaction 10 a, b, c 20 b,

Frequent Itemsets & Constraints TDB (min_sup=2) TID Transaction 10 a, b, c 20 b, c, d, f 30 a, c Item Value a 40 b 10 c -20 d 10 e -30 n n n Given a transaction database Frequent itemset: a subset of items frequently appear in transactions, e. g. {a, c} Constraint: a predicate over itemsets n C(I): sum(I)>50 n C(abd)= true 10

Mining Frequent Itemsets With Constraints n Given n n A transaction database TDB A

Mining Frequent Itemsets With Constraints n Given n n A transaction database TDB A support threshold min_sup A constraint C Find the complete set of frequent itemsets satisfying the constraint Use constraint to n n Express user’s focus Improve both effectiveness and efficiency 11

Classification of Constraints n We have the following classification of constraints n Anti-monotone n

Classification of Constraints n We have the following classification of constraints n Anti-monotone n Monotone n Succinct n Convertible n n Convertible anti-monotone n Convertible monotone n Strongly convertible Inconvertible 12

Anti-Monotone n n Definition 1 (Anti-Monotone): A 1 -var constraint C is anti-monotone if

Anti-Monotone n n Definition 1 (Anti-Monotone): A 1 -var constraint C is anti-monotone if for all sets S, S’: S S’ & S satisfies C S’ satisfies C. Simply, when an intemset S violates the constraint, so does any of its superset 13

Is Min(S) v anti-monotone? S={5, 10, 14}, v = 7 Min(S) 7 {5} violates

Is Min(S) v anti-monotone? S={5, 10, 14}, v = 7 Min(S) 7 {5} violates it. Superset {5}: {5, 10}, {5, 14}, {5, 10 , 14} So does {5, 10}, {5, 14}, {5, 10 , 14} Min(S) v is anti-monotone 14

Succinct n Definition 2 (Succinct) n n n I Item is a succinct set

Succinct n Definition 2 (Succinct) n n n I Item is a succinct set if it can be expressed as p(Item) for some selection predicate p. SP 2 Item is a succinct powerset if there is a fixed number of succinct sets Item 1, … Itemk Item such that SP can be expressed in terms of the strict powersets of Item 1, …, Itemk, using union and minus. Finally, a 1 -var constraint C is succinct provided SATc(Item) is a succinct powerset. 15

Succinct n n General idea: we can enumerate all and only those sets that

Succinct n n General idea: we can enumerate all and only those sets that are guaranteed to satisfy the constraint. If a constraint is succinct, we can directly generate precisely the sets that satisfy it. 16

Succinct example n Itemset containing a or b n Itemset containing some item with

Succinct example n Itemset containing a or b n Itemset containing some item with value more than 30 17

Succinct example n C 1 Item. Price 100 n n n Item 1 =

Succinct example n C 1 Item. Price 100 n n n Item 1 = Item. price 100(Item)={a, b} 2 Item 1={{a}, {b}, {a, b}} SATc 1 = 2 Item 1 C 1 is succinct 18

Convertible n Convert tough constraints into antimonotone or monotone by properly order items 19

Convertible n Convert tough constraints into antimonotone or monotone by properly order items 19

Convertible n n n Definition: R is an order of items Convertible anti-monotone n

Convertible n n n Definition: R is an order of items Convertible anti-monotone n Itemset X satisfies constraint so does every prefix of X w. r. t. R 20

Convertible example n constraint C: avg(X) 25 n Item Value a 40 b 0

Convertible example n constraint C: avg(X) 25 n Item Value a 40 b 0 f 30 c -20 g 20 Itemset afd satisfies C d 10 n So do prefixes a and af e -30 b 0 30 h -10 n Thus, it becomes f g 20 c -20 h -10 e -30 Order items in valuedescending order n n <a, f, g, d, b, h, c, e> n Anti-monotone! 21

Commonly Used Constraints— A General Picture Constraint Antimonotone Monotone Succinct v S no yes

Commonly Used Constraints— A General Picture Constraint Antimonotone Monotone Succinct v S no yes yes S V yes no yes min(S) v yes no yes max(S) v no yes count(S) v yes no weakly count(S) v no yes weakly sum(S) v ( a S, a 0 ) yes no no sum(S) v ( a S, a 0 ) no yes no range(S) v yes no no range(S) v no yes no avg(S) v, { , , } convertible no support(S) 22

Optional Proof of min(S) v is Anti-monotone n n n According to the table,

Optional Proof of min(S) v is Anti-monotone n n n According to the table, min(S) v is both anti-monotone and succinct. I only proof anti-monotone here due to time limitation. Something special… 23

Constraint Classification Monotone Antimonotone Succinct Strongly convertible Convertible anti-monotone Convertible monotone Inconvertible 24

Constraint Classification Monotone Antimonotone Succinct Strongly convertible Convertible anti-monotone Convertible monotone Inconvertible 24

Summary of Approach Recapitulation n n Basic idea about mining frequent itemsets with constraints.

Summary of Approach Recapitulation n n Basic idea about mining frequent itemsets with constraints. Introduce several important constraints. 25

Outline n Introduction Summary of Approach n Algorithm CAP n n Performance Analysis Conclusion

Outline n Introduction Summary of Approach n Algorithm CAP n n Performance Analysis Conclusion References 26

Algorithms n There are many algorithms in solving constrained based association rules mining. n

Algorithms n There are many algorithms in solving constrained based association rules mining. n n n Algorithm Algorithm Direct Multi. Joins & Reorder Apriori† Hybrid(m) CAP (Main Focus) 27

Design of Algorithm n Sound n n An algorithm is sound provided it only

Design of Algorithm n Sound n n An algorithm is sound provided it only finds frequent sets that satisfy the given constraints. Complete n An algorithm is complete provided all frequent sets satisfying the given constraints are found. 28

Algorithm Apriori† n Main idea : Use Apriori Algorithm to get the frequent item

Algorithm Apriori† n Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found. n n Step 1) Apriori with Cfreq Step 2) Apply C – Cfreq to get final Ans 29

Algorithm Apriori† (Pseudocode) 1. C 1 consists of sets of size 1; k =

Algorithm Apriori† (Pseudocode) 1. C 1 consists of sets of size 1; k = 1; Ans = ; 2. While (Ck not empty) { 2. 1 conduct db scan to form Lk from Ck; 2. 2 form Ck+1 from Lk based on Cfreq; k++; } 3. For each set S in some Lk: Add S to Ans if S satisfies (C – Cfreq). 30

The Apriori† Algorithm — An Example Database TDB Tid 10 20 30 40 L

The Apriori† Algorithm — An Example Database TDB Tid 10 20 30 40 L 2 Items A, C, D B, C, E A, B, C, E B, E C 1 1 st scan Itemset sup {A, C} 2 {B, E} 3 {C, E} 2 C 3 Itemset {B, C, E} C 2 Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 3 rd scan L 3 L 1 Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 C 2 2 nd scan Itemset sup {B, C, E} 2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} 31

The Apriori† Algorithm — An Example (cont. ) L 1 Database TDB Tid 10

The Apriori† Algorithm — An Example (cont. ) L 1 Database TDB Tid 10 20 30 40 Items A, C, D B, C, E A, B, C, E B, E Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 L 2 Itemset sup {A, C} 2 {B, E} 3 {C, E} 2 L 3 Itemset sup {B, C, E} 2 Constraint : {A, C, E} T. Item Ans {A} {C} {E} {A, C} {C, E} 32

Algorithm CAP n Succinct and Anti-monotone n n Strategy I: Replace C 1 in

Algorithm CAP n Succinct and Anti-monotone n n Strategy I: Replace C 1 in the Apriori Algorithm by C 1 C. Anti-monotone but non-succinct n Strategy II: Define Ck as in the Apriori Algorithm. Drop a set S Ck from counting if S fails C, i. e. , constraint satisfaction is tested before counting is done. 33

Algorithm CAP (cont. ) n Succinct but non-anti-monotone n n Strategy III: Too Complicated.

Algorithm CAP (cont. ) n Succinct but non-anti-monotone n n Strategy III: Too Complicated. To be discussed later… Non-succinct & non-anti-monotone n Strategy IV: Induce any weaker constraint C 1 from C. Depending on whether C 1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set. 34

Algorithm CAP (Pseudocode) 1 if Csam Csuc Cnone is non-empty, prepare C 1 as

Algorithm CAP (Pseudocode) 1 if Csam Csuc Cnone is non-empty, prepare C 1 as indicated in Strategies I, III, and IV; k = 1; 2 if Csuc is non-empty { 2. 1 conduct db scan to form L 1 as indicated in Strategy III; 2. 2 form C 2 as indicated in Strategy III; k = 2; } 3 while (Ck not empty) { 3. 1 conduct db scan to form Lk from Ck; 3. 2 form Ck+1 from Lk based on Strategy III if Csuc is nonempty, and Strategy II for constraints in Cam; } 4. if Cnone is empty, Ans = ULk. Otherwise, for each set S in some Lk, add S to Ans iff S satisfies Cnone. 35

The Algorithm CAP — An Example Constraints : {A, C, E} T. Item &

The Algorithm CAP — An Example Constraints : {A, C, E} T. Item & min support count = 2 Question : Which strategy should we apply? Database TDB Tid 10 20 30 40 Items A, C, D B, C, E A, B, C, E B, E 36

The Algorithm CAP — An Example (Cont. ) L Itemset sup Database TDB Tid

The Algorithm CAP — An Example (Cont. ) L Itemset sup Database TDB Tid 10 20 30 40 Items A, C, D B, C, E A, B, C, E B, E Apply Strategy I!!! C 1 Itemset sup 1 st scan {A} 2 {C} {E} L 2 Itemset sup {A, C} 2 {C, E} 2 C 3 Itemset {} C 2 3 3 Itemset sup {A, C} 2 {A, E} 1 {C, E} 2 1 {A} {C} {E} C 2 2 nd scan Because {A, E} is pruned earlier 2 3 3 Itemset {A, C} {A, E} {C, E} Ans {A} {C} {E} {A, C} {C, E} 37

Case 3 : Succinct but not antimonotone. Revisit… {1} {2} {3} {4} {5} {6}

Case 3 : Succinct but not antimonotone. Revisit… {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {1} {2} {3} {4} Some possible frequent sets may be lost: e. g. {1, 8} {1, 2, 10} min (S) < 5 Apriori {1} {2} {3} {4} {1, 2} {2, 3}………{3, 4} ……… {1, 2, 3, 4} **Information extracted from past presentation. 38

Case 3 : Succinct but not antimonotone. Continue… n Algorithm Direct n n n

Case 3 : Succinct but not antimonotone. Continue… n Algorithm Direct n n n Idea : Play it safe. Generate Cck+1 by using Lck x F where F is the set of all frequent items. Algorithm Multi. Joins Algorithm Reorder 39

Outline n n n Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Outline n n n Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 40

Performance Analysis (Specification) n n n Programs written in C Generate transactional databases using

Performance Analysis (Specification) n n n Programs written in C Generate transactional databases using program from IBM Almaden Research Center 100, 000 records, domain of 1, 000 items Page size 4 KB SPARC-10 environment 41

Performance Analysis (Terminology) n Speedup n n Item Selectivity n n Comparison of execution

Performance Analysis (Terminology) n Speedup n n Item Selectivity n n Comparison of execution time between two algorithms. x% of them items satisfying the constraints. Support Threshold n *Low support threshold means more frequent set to process. 42

Performance Analysis n n n Note: Support threshold set at 0. 5%. For 10%

Performance Analysis n n n Note: Support threshold set at 0. 5%. For 10% selectivity, CAP runs 80 times faster than Apriori†! For 30% selectivity, the speedup is about 10 times. 43

Performance Analysis n n n Note: Item Selectivity fixed at 30%. Support threshold goes

Performance Analysis n n n Note: Item Selectivity fixed at 30%. Support threshold goes up, frequent item set goes down, Apriori† improves. CAP still at least 8 times faster. 44

Performance Analysis Support L 1 L 2 L 3 L 4 L 5 L

Performance Analysis Support L 1 L 2 L 3 L 4 L 5 L 6 L 7 L 8 0. 2% 174/582 79/969 29/1140 8/1250 1/934 0/451 0/132 0/20 0. 6% 98/313 1/12 0/1 0 0 0 n Each entry is of the form a/b n n n a is the # of frequent set satisfying the constraint. B is the total number of frequent set. For L 4 with support of 0. 2%, Apriori† finds 1250 frequent sets where 8 of which is found by CAP. 45

Conclusion n n The idea of anti-monotonicity, succinctness, and convertible are introduced in the

Conclusion n n The idea of anti-monotonicity, succinctness, and convertible are introduced in the paper. Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining. 46

Reference n n n R. Srikant, Q. Vu, and R. Agrawal. Mining association rules

Reference n n n R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD’ 97. R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’ 98. J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’ 00. 47