FrequentPattern Tree Bottleneck of Frequentpattern Mining u u

Frequent-Pattern Tree

Bottleneck of Frequent-pattern Mining u u Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates l To find frequent itemset i 1 i 2…i 100 # of scans: 100 u # of Candidates: (1001) + (1002) + … + (110000) = 2100 -1 = 1. 27*1030 ! u u u Bottleneck: candidate-generation-and-test Can we avoid candidate generation? 2

Mining Freq Patterns w/o Candidate Generation u Grow long patterns from short ones using local frequent items l l “abc” is a frequent pattern Get all transactions having “abc”: DB|abc (projected database on abc) “d” is a local frequent item in DB|abc abcd is a frequent pattern Get all transactions having “abcd” (projected database on “abcd”) and find longer itemsets 3

Mining Freq Patterns w/o Candidate Generation u Compress a large database into a compact, Frequent. Pattern tree (FP-tree) structure l l u Highly condensed, but complete for frequent pattern mining Avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method l l A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: examine sub-database (conditional pattern base) only! 4

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Steps: 1. 2. 3. Scan DB once, find frequent 1 -itemset (single item pattern) Order frequent items in frequency descending order: f, c, a, b, m, p (L-order) Process DB based on L-order min_sup= 50% a 3 i 1 b 3 j 1 c 4 k 1 d 1 l 2 e 1 m 3 f 4 n 1 g 1 o 2 h 1 p 3 5

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Header Table {} Item frequency head f 0 nil c 0 nil a 0 nil b 0 nil m 0 nil p 0 nil Initial FP-tree 6

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 1 c 1 a 1 b 0 nil m 1 p 1 Insert {f, c, a, m, p} f: 1 c: 1 a: 1 m: 1 p: 1 7

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 2 c 2 a 2 b 1 m 2 p 1 Insert {f, c, a, b, m} f: 2 c 2 a: 2 m: 1 b: 1 p: 1 m: 1 8

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 3 c 2 a 2 b 2 m 2 p 1 Insert {f, b} f: 3 c: 2 b: 1 a: 2 m: 1 b: 1 p: 1 m: 1 9

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 3 c 3 a 2 b 3 m 2 p 2 Insert {c, b, p} f: 3 c: 2 c: 1 b: 1 a: 2 b: 1 p: 1 m: 1 10

Construct FP-tree from a Transaction DB TIDItems bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 Insert {f, c, a, m, p} f: 4 c: 3 c: 1 b: 1 a: 3 b: 1 p: 1 m: 2 b: 1 p: 2 m: 1 11

Benefits of FP-tree Structure u Completeness: l l u Preserve complete DB information for frequent pattern mining (given prior min support) Each transaction mapped to one FP-tree path; counts stored at each node Compactness l l l One FP-tree path may correspond to multiple transactions; tree is never larger than original database (if not count node-links and counts) Reduce irrelevant information—infrequent items are gone Frequency-descending ordering: more frequent items are closer to tree top and more likely to be shared 12

How Effective Is FP-tree? Dataset: Connect-4 (a dense dataset) 13

Mining Frequent Patterns Using FP-tree u General idea (divide-and-conquer) l u Recursively grow frequent pattern path using FP-tree Frequent patterns can be partitioned into subsets according to L-order l l l l L-order=f-c-a-b-m-p Patterns containing p Patterns having m but no p Patterns having b but no m or p … Patterns having c but no a nor b, m, p Pattern f 14

Mining Frequent Patterns Using FP-tree u u u Step 1 : Construct conditional pattern base for each item in header table Step 2: Construct conditional FP-tree from each conditional pattern-base Step 3: Recursively mine conditional FP-trees and grow frequent patterns obtained so far l If conditional FP-tree contains a single path, simply enumerate all patterns 15

Step 1: Construct Conditional Pattern Base u u u Starting at header table of FP-tree Traverse FP-tree by following link of each frequent item Accumulate all transformed prefix paths of item to form a conditional pattern base Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 {} f: 4 c: 3 c: 1 b: 1 a: 3 b: 1 p: 1 Conditional pattern bases item cond. pattern base c f: 3 a fc: 3 b fca: 1, f: 1, c: 1 m: 2 b: 1 m fca: 2, fcab: 1 p: 2 m: 1 p fcam: 2, cb: 1 16

Step 2: Construct Conditional FP-tree u For each pattern-base l l Accumulate count for each item in base Construct FP-tree for frequent items of pattern base Conditional pattern bases min_sup= 50% item cond. pattern base # transaction =5 c f: 3 a fc: 3 b fca: 1, f: 1, c: 1 f 2 m fca: 2, fcab: 1 c 3 p fcam: 2, cb: 1 a 2 m 2 b 1 p conditional FP-tree fcam cb Item frequency head c 3 {} c: 3 17

Mining Frequent Patterns by Creating Conditional Pattern. Bases Item Conditional pattern-base Conditional FP-tree p {(fcam: 2), (cb: 1)} {(c: 3)}|p m {(fca: 2), (fcab: 1)} {(f: 3, c: 3, a: 3)}|m b {(fca: 1), (f: 1), (c: 1)} Empty a {(fc: 3)} {(f: 3, c: 3)}|a c {(f: 3)}|c f Empty 18

Step 3: Recursively mine conditional FP-tree u Collect all patterns that end at p suffix: p(3) FP: p(3) CPB: fcam: 2, cb: 1 FP-tree: Suffix: cp(3) c(3) FP: cp(3) CPB: nil 19

Step 3: Recursively mine conditional FP-tree • Collect all patterns that end at m FP-tree: suffix: m(3) FP: m(3) c(3) CPB: fca: 2, fcab: 1 suffix: am(3) a(3) suffix: cm(3) FP: cm(3) Continue next page f(3) suffix: fm(3) CPB: f: 3 FP-tree: f(3) FP: fm(3) CPB: nil suffix: fcm(3) FP: fcm(3) CPB: nil 20

Collect all patterns that end at m (cont’d) FP-tree: suffix: am(3) FP: am(3) f(3) CPB: fc: 3 c(3) FP-tree: suffix: cam(3) FP: cam(3) f(3) CPB: f: 3 suffix: fam(3) FP: fam(3) CPB: nil suffix: fcam(3) FP: fcam(3) CPB: nil 21

FP-growth vs. Apriori: Scalability With the Support Threshold Data set T 25 I 20 D 10 K 22

Why Is Frequent Pattern Growth Fast? u Performance study shows l u FP-growth is an order of magnitude faster than Apriori Reasoning l l No candidate generation, no candidate test Use compact data structure Eliminate repeated database scan Basic operations are counting and FP-tree building 23

Weaknesses of FP-growth u u u Support dependent; cannot accommodate dynamic support threshold Cannot accommodate incremental DB update Mining requires recursive operations 24