# Mining NonDerivable Association Rules Bart Goethals Juho Muhonen

• Slides: 15

Mining Non-Derivable Association Rules Bart Goethals, Juho Muhonen, Hannu Toivonen Proceeding of SIAM 2005 Speaker: Pei-Min Chou Date: 05/12/30

Introduction p Association rule n Support: p n Confidence: p n n n p Ex: A=>C; 2/4=50% Ex: A=>C; (AC)/(A)=2/3=67% X=>Y X, Y: itemset Frequent: X∪Y is frequent Confident : supp(X∪Y)/supp(X)≥confidence threshold Typically association rule: n n large Redundant 1 A, B, C 2 A, C 3 A, D 4 B, D

Introduction (cont. ) p Related n n n p Apply rule with the same confidence Use specific inference system to prune Does not give error bound Mining non-derivable association rule n n Find tight bounds on confidence of rule from its subrule If low bound=upper bound derivable

Non-Derivable set property p Downward closed n n p all supersets of a derivable set are derivable all subsets of a non-derivable set are non-derivable Given all subrules of X=>Y n X=>Y is derivable if and only if X∪Y is a derivable set

Method Goal: remove all derivable association rule p Different case p n n Rules have exactly same condition and consequent Fixed consequent Single item: ex. abc=>d p Multiple item: ex. abc=>de p n Fixed condition or consequent: use method above p condition: p § use inclusion-exclusion principle § Some subrules

Example p p p Consider rule abc=>d All subrules: rule sets ab=>d ab, abd ac=>d ac, acd bc=>d bc, bcd a=>d a, ad b=>d b, bd c=>d c, cd {}=>d {}, d We miss information of abc and abcd

Bounds on supp(abc) p Use inclusion-exclusion principle n n n n Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)supp(a)-supp(b)-supp(c)+supp({}) Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) Supp(abc) ≤ supp(ab) Supp(abc) ≤ supp(bc) Supp(abc) ≤ supp(ac) Supp(abc) ≥ 0

Example (cont. ) p ab=>c Supp(ac) =3 Supp(bc) =3 Supp(a) =7 Supp(b) =7 Supp(c) =5 Supp({})=10 For supp(ab)=6 supp(ab)=7 supp(ab)=4 supp(ab)=5 p Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)supp(c)+supp({}) =7+3+3 -7 -7 -5+10=4 =4+3+3 -7 -7 -5+10=1 =5+3+3 -7 -7 -5+10=2 =6+3+3 -7 -7 -5+10=3 p Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =7+3 -7=3 =4+3 -7=0 =5+3 -7=1 =6+3 -7=2 p Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =7+3 -7=3 =4+3 -7=0 =5+3 -7=1 =6+3 -7=2 Supp(ab) ≥ supp(a)+supp(b)-supp({}) p Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3 -5=1 (low) p Supp(abc) ≤ =7+7 -10=4 supp(ab)=5 supp(ab)=6 supp(ab)=7 Supp(ab) ≤supp(a)=supp(b)=7 (upper) p Supp(abc) ≤ supp(bc)=3 p Supp(abc) ≤ supp(ac)=3 p Supp(abc) ≥ 0 supp(abc) conf(ab=>c) Supp(ab)=4 [1, 1] [1/4, 1/4] Supp(ab)=5 [1, 2] [1/5, 2/5] Supp(ab)=6 [2, 3] [2/6, 3/6] Supp(ab)=7 [3, 3] [3/7, 3/7] Confidence interval: ab=>c is [1/5, 1/2]

Example (cont. ) p ab=>c Supp(ac) =7 Supp(bc) =7 Supp(a) =7 Supp(b) =7 Supp(c) =10 Supp({})=10 For supp(ab)=4 p Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)supp(c)+supp({}) =4+7+7 -7 -7 -10+10=4 Supp(ab)≥supp(a)+supp(b)-supp({}) p Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =4+7 -7=4 p Supp(abc) ≥=7+7 -10=4 supp(ab)+supp(bc)-supp(b) =4+7 -7=4 p Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =7+7 -10=4 Supp(ab)≤supp(a)=supp(b)=7 p Supp(abc) ≤ supp(ab)=4 p Supp(abc) ≤ supp(bc)=7 p Supp(abc) ≤ supp(ac)=7 p Supp(abc) ≥ 0 supp(abc) conf(ab=>c) Supp(ab)=4 [4, 4] [1, 1] Supp(ab)=5 [5, 5] [1, 1] Supp(ab)=6 [6, 6] [1, 1] Supp(ab)=7 [7, 7] [1, 1] Supp(ab)=[4, 7] non-derivable ab=>c is [1, 1] derivable

Use subrules p For any subset J n p I, such that |IJ|≥k-1 K>0: user given parameter depth Ex. depth=4 n n n n Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)supp(a)-supp(b)-supp(c)+supp({}) Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) Supp(abc) ≤ supp(ab) Supp(abc) ≤ supp(bc) Supp(abc) ≤ supp(ac) Supp(abc) ≥ 0

Experiments p Dataset characteristics p Number of rules after different pruning methods

Exp(1) pnon-derivable p. Minimal closed association rules

Exp(2) pnon-derivable pbasic association rules pmaximum entropy method

Exp(3) ---non-derivable with singular consequent

Exp(4) ---non-derivable with different support