732 A 02 Data Mining Clustering and Association

  • Slides: 18
Download presentation
732 A 02 Data Mining Clustering and Association Analysis • Constrained frequent itemset mining

732 A 02 Data Mining Clustering and Association Analysis • Constrained frequent itemset mining ………………… Jose M. Peña jospe@ida. liu. se

Constraints ®A constraint C(. ) is ® Monotone If C(A) then C(B) for all

Constraints ®A constraint C(. ) is ® Monotone If C(A) then C(B) for all A B. ® E. g. A’ A. ® ® Antimonotone If C(A) then C(B) for all B A. ® Or, if not C(B) then not C(A) for all B A. ® E. g. support ≥ min_support. ® The apriori property applies to any antimonotone constraint. ®

Constraints ® sum(S. Price) ® min(S. Price) v is monotone (positive prices). v is

Constraints ® sum(S. Price) ® min(S. Price) v is monotone (positive prices). v is monotone. ® range(S. Price) ® ® 15 is monotone. Itemset ab satisfies C So does every superset of ab Item Price a 40 b 0 c -20 d 10 e -30 f 30 g 20 h -10

Constraints ® sum(S. Price) v is antimonotone (positive prices). ® sum(S. Price) v is

Constraints ® sum(S. Price) v is antimonotone (positive prices). ® sum(S. Price) v is not antimonotone. ® range(S. Price) 15 is antimonotone. ® Itemset ab violates C ® So does every superset of ab Item Price a 40 b 0 c -20 d 10 e -30 f 30 g 20 h -10

Constraints Constraint v S S V Antimonotone no no Monotone yes S V min(S)

Constraints Constraint v S S V Antimonotone no no Monotone yes S V min(S) v yes no no yes min(S) v max(S) v yes no no max(S) v count(S) v no yes no count(S) v no yes sum(S) v ( a S, a 0 ) yes no no yes range(S) v yes no no yes avg(S) v, { , , } support(S) No but convertible yes No but convertible no support(S) no yes

Apriori algorithm + any constraint Database D L 1 C 1 Scan D C

Apriori algorithm + any constraint Database D L 1 C 1 Scan D C 2 Scan D L 2 C 3 Scan D L 3 Constraint: Sum{S. price} < 5, where item price equals item id

Apriori algorithm + antimonotone constraint Prune search space Database D L 1 C 1

Apriori algorithm + antimonotone constraint Prune search space Database D L 1 C 1 Scan D C 2 Scan D L 2 C 3 Scan D L 3 Constraint: Sum{S. price} < 5, where item price equals item id

Apriori algorithm + monotone constraint Does not prune search space but avoids constraint checking

Apriori algorithm + monotone constraint Does not prune search space but avoids constraint checking Database D L 1 C 1 Scan D C 2 Scan D L 2 ☺ Not in the output, since they don’t satisfy the constraint ☺ ☺ C 3 ☺ Scan D L 3 Constraint: Sum{S. price} ≥ 5, where item price equals item id

FP grow algorithm + antimonotone constraint Similar in Apriori (prune search space) Specific of

FP grow algorithm + antimonotone constraint Similar in Apriori (prune search space) Specific of FP grow (avoids constraint check)

FP grow algorithm + monotone constraint ® If C(α) then do not check C(.

FP grow algorithm + monotone constraint ® If C(α) then do not check C(. ) in TDB|α

Constraints ® avg(S. Price) v and avg(S. Price) ≥ v are neither monotone nor

Constraints ® avg(S. Price) v and avg(S. Price) ≥ v are neither monotone nor antimonotone. ® Convertible monotone ® If there exists an item order R such that ® ® ® If C(A) then C(B) for all A and B respecting R such that A is a suffix of B. E. g. avg(S. Price) ≥ v wrt decreasing price order. Convertible antimonotone ® If there exists an item order R such that ® ® ® If C(A) then C(B) for all A and B respecting R such that B is a suffix of A. Or, if not C(B) then not C(A) for all A and B respecting R such that B is a suffix of A. E. g. avg(S. Price) ≥ v wrt to increasing price order.

Constraints ® avg(X) 25 is convertible monotone wrt descending item price order R: <

Constraints ® avg(X) 25 is convertible monotone wrt descending item price order R: < a, f, g, d, b, h, c, e> ® ® If an itemset d satisfies a constraint C, so do itemsets fd and afd, which have d as a suffix. avg(X) 25 is convertible antimonotone wrt ascending item price item order R-1: < e, c, h, b, d, g, f, a > ® If an itemset dfa satisfies a constraint C, so do itemsets fa and a, which are suffixes of dfa. Thus, avg(X) 25 is strongly convertible. ® Check that avg(X) 25 is also strongly convertible. ®

Constraints Constraint Convertible antimonotone Convertible monotone Strongly convertible avg(S) , v Yes Yes median(S)

Constraints Constraint Convertible antimonotone Convertible monotone Strongly convertible avg(S) , v Yes Yes median(S) , v Yes Yes sum(S) v (items could be of any value, v 0) Yes No No sum(S) v (items could be of any value, v 0) No Yes No sum(S) v (items could be of any value, v 0) Yes No No ……

Constraints Monotone Antimonotone Strongly convertible Convertible antimonotone Inconvertible avg(S)-median(S)=0 Convertible monotone

Constraints Monotone Antimonotone Strongly convertible Convertible antimonotone Inconvertible avg(S)-median(S)=0 Convertible monotone

FP grow algorithm + convertible antimonotone constraint ® Instead of ordering the items according

FP grow algorithm + convertible antimonotone constraint ® Instead of ordering the items according to decreasing frequency, now the items are ordered according to the order R of the constraint. False: Such items can appear not only as suffix. False: No check is needed for those itemsets that are a suffix of α U β. The check is needed for the rest of items. True: α will be added as suffix to any itemset derived from TDB|α and the result respects R.

FP grow algorithm + convertible monotone constraint ® With ® monotone constraint If C(α)

FP grow algorithm + convertible monotone constraint ® With ® monotone constraint If C(α) then do not check C(. ) in TDB|α ® With convertible monotone constraint Instead of ordering the items according to decreasing frequency, now the items are ordered according to the order R of the constraint. ® If C(α) then do not check C(. ) in TDB|α because α will be added as suffix to any itemset derived from TDB|α and the result respects R. ®

Exercise ® How would you incorporate covertible constraints in the Apriori algorithm ?

Exercise ® How would you incorporate covertible constraints in the Apriori algorithm ?