Data Mining Techniques Association Rule What Is Association

  • Slides: 27
Download presentation
Data Mining Techniques Association Rule

Data Mining Techniques Association Rule

What Is Association Mining? • Association Rule Mining – • Applications – • Finding

What Is Association Mining? • Association Rule Mining – • Applications – • Finding frequent patterns, associations, correlations, or causal structures among item sets in transaction databases, relational databases, and other information repositories Market basket analysis (marketing strategy: items to put on sale at reduced prices), cross-marketing, catalog design, shelf space layout design, etc Examples Rule form: Body ® Head [Support, Confidence]. – buys(x, “Computer”) ® buys(x, “Software”) [2%, 60%] – major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%] –

Market Basket Analysis Typically, association rules are considered interesting if they satisfy both a

Market Basket Analysis Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.

Rule Measures: Support and Confidence • Let minimum support 50%, and minimum confidence 50%,

Rule Measures: Support and Confidence • Let minimum support 50%, and minimum confidence 50%, we have – A C [50%, 66. 6%] – C A [50%, 100%]

Support & Confidence

Support & Confidence

 • Given Association Rule: Basic Concepts – (1) database of transactions, – (2)

• Given Association Rule: Basic Concepts – (1) database of transactions, – (2) each transaction is a list of items (purchased by a customer in a visit) • Find all rules that correlate the presence of one set of items with that of another set of items • Find all the rules A B with minimum confidence and support – support, s, P(A B) – confidence, c, P(B|A)

Terminologies • Item – I 1, I 2, I 3, … – A, B,

Terminologies • Item – I 1, I 2, I 3, … – A, B, C, … • Itemset – {I 1}, {I 1, I 7}, {I 2, I 3, I 5}, … – {A}, {A, G}, {B, C, E}, … • 1 -Itemset – {I 1}, {I 2}, {A}, … • 2 -Itemset – {I 1, I 7}, {I 3, I 5}, {A, G}, …

Terminologies • K-Itemset – If the length of the itemset is K • Frequent

Terminologies • K-Itemset – If the length of the itemset is K • Frequent (Large) K-Itemset – If the length of the itemset is K and the itemset satisfies a minimum support threshold. • Association Rule – If a rule satisfies both a minimum support threshold and a minimum confidence threshold

Analysis • The number of itemsets of a given cardinality tends to grow exponentially

Analysis • The number of itemsets of a given cardinality tends to grow exponentially

Fast Algorithms for Mining Association Rules

Fast Algorithms for Mining Association Rules

Mining Association Rules: Apriori Principle Min. support 50% Min. confidence 50% • For rule

Mining Association Rules: Apriori Principle Min. support 50% Min. confidence 50% • For rule A C: – support = support({A C}) = 50% – confidence = support({A C})/support({A}) = 66. 6% • The Apriori principle: – Any subset of a frequent itemset must be frequent

Mining Frequent Itemsets: the Key Step • Find the frequent itemsets: the sets of

Mining Frequent Itemsets: the Key Step • Find the frequent itemsets: the sets of items that have minimum support – A subset of a frequent itemset must also be a frequent itemset • i. e. , if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset – Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) • Use the frequent itemsets to generate association rules

Example Database D 1 3 4 scan D 2 3 5 1 2 3

Example Database D 1 3 4 scan D 2 3 5 1 2 3 5 count C 1 2 5 generate C 2 generate C 3 C 2 12 13 15 23 25 35 scan D count C 2 C 3 235 scan D count C 3 C 1 count 1 2 2 3 generate L 1 3 3 4 1 5 3 L 1 1 2 3 5 C 2 count 12 1 13 2 generate L 2 15 1 23 2 25 3 35 2 L 2 13 23 25 35 C 3 count 235 2 generate L 3 235

Example of Generating Candidates • L 3={abc, abd, ace, bcd} • Self-joining: L 3*L

Example of Generating Candidates • L 3={abc, abd, ace, bcd} • Self-joining: L 3*L 3 – abcd from abc and abd – acde from acd and ace • Pruning: – acde is removed because ade is not in L 3 • C 4={abcd}

Example

Example

Apriori Algorithm

Apriori Algorithm

Apriori Algorithm

Apriori Algorithm

Apriori Algorithm

Apriori Algorithm

Exercise 4 min-sup = 20% min-conf = 80%

Exercise 4 min-sup = 20% min-conf = 80%

Demo-IBM Intelligent Minner

Demo-IBM Intelligent Minner

Demo Database

Demo Database

Multi-Dimensional Association • Single-Dimensional (Intra-Dimension) Rules: Single Dimension (Predicate) with Multiple Occurrences. buys(X, “milk”)

Multi-Dimensional Association • Single-Dimensional (Intra-Dimension) Rules: Single Dimension (Predicate) with Multiple Occurrences. buys(X, “milk”) buys(X, “bread”) • Multi-Dimensional Rules: 2 Dimensions – Inter-dimension association rules (no repeated predicates) age(X, ” 19 -25”) occupation(X, “student”) buys(X, “coke”) – hybrid-dimension association rules (repeated predicates) age(X, ” 19 -25”) buys(X, “popcorn”) buys(X, “coke”) • Categorical (Nominal) Attributes – finite number of possible values, no ordering among values • Quantitative Attributes – numeric, implicit ordering among values

Exercise 5 min-sup = 20% min-conf = 80%

Exercise 5 min-sup = 20% min-conf = 80%

Research Topics • • • Quantitative Association Rules – buys (bread, 5) ® buys

Research Topics • • • Quantitative Association Rules – buys (bread, 5) ® buys (milk, 3) Weighted Association Rules High Utility Association Rules Non-redundant Association Rule Constrained Association Rules Mining Multi-dimensional Association Rules Generalized Association Rules Negative Association Rules Incremental Mining Association Rules Data Stream Association Rule Mining Interactive Mining Association Rules