Data Mining Techniques Association Rule What Is Association
- Slides: 27
Data Mining Techniques Association Rule
What Is Association Mining? • Association Rule Mining – • Applications – • Finding frequent patterns, associations, correlations, or causal structures among item sets in transaction databases, relational databases, and other information repositories Market basket analysis (marketing strategy: items to put on sale at reduced prices), cross-marketing, catalog design, shelf space layout design, etc Examples Rule form: Body ® Head [Support, Confidence]. – buys(x, “Computer”) ® buys(x, “Software”) [2%, 60%] – major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%] –
Market Basket Analysis Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.
Rule Measures: Support and Confidence • Let minimum support 50%, and minimum confidence 50%, we have – A C [50%, 66. 6%] – C A [50%, 100%]
Support & Confidence
• Given Association Rule: Basic Concepts – (1) database of transactions, – (2) each transaction is a list of items (purchased by a customer in a visit) • Find all rules that correlate the presence of one set of items with that of another set of items • Find all the rules A B with minimum confidence and support – support, s, P(A B) – confidence, c, P(B|A)
Terminologies • Item – I 1, I 2, I 3, … – A, B, C, … • Itemset – {I 1}, {I 1, I 7}, {I 2, I 3, I 5}, … – {A}, {A, G}, {B, C, E}, … • 1 -Itemset – {I 1}, {I 2}, {A}, … • 2 -Itemset – {I 1, I 7}, {I 3, I 5}, {A, G}, …
Terminologies • K-Itemset – If the length of the itemset is K • Frequent (Large) K-Itemset – If the length of the itemset is K and the itemset satisfies a minimum support threshold. • Association Rule – If a rule satisfies both a minimum support threshold and a minimum confidence threshold
Analysis • The number of itemsets of a given cardinality tends to grow exponentially
Fast Algorithms for Mining Association Rules
Mining Association Rules: Apriori Principle Min. support 50% Min. confidence 50% • For rule A C: – support = support({A C}) = 50% – confidence = support({A C})/support({A}) = 66. 6% • The Apriori principle: – Any subset of a frequent itemset must be frequent
Mining Frequent Itemsets: the Key Step • Find the frequent itemsets: the sets of items that have minimum support – A subset of a frequent itemset must also be a frequent itemset • i. e. , if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset – Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) • Use the frequent itemsets to generate association rules
Example Database D 1 3 4 scan D 2 3 5 1 2 3 5 count C 1 2 5 generate C 2 generate C 3 C 2 12 13 15 23 25 35 scan D count C 2 C 3 235 scan D count C 3 C 1 count 1 2 2 3 generate L 1 3 3 4 1 5 3 L 1 1 2 3 5 C 2 count 12 1 13 2 generate L 2 15 1 23 2 25 3 35 2 L 2 13 23 25 35 C 3 count 235 2 generate L 3 235
Example of Generating Candidates • L 3={abc, abd, ace, bcd} • Self-joining: L 3*L 3 – abcd from abc and abd – acde from acd and ace • Pruning: – acde is removed because ade is not in L 3 • C 4={abcd}
Example
Apriori Algorithm
Apriori Algorithm
Apriori Algorithm
Exercise 4 min-sup = 20% min-conf = 80%
Demo-IBM Intelligent Minner
Demo Database
Multi-Dimensional Association • Single-Dimensional (Intra-Dimension) Rules: Single Dimension (Predicate) with Multiple Occurrences. buys(X, “milk”) buys(X, “bread”) • Multi-Dimensional Rules: 2 Dimensions – Inter-dimension association rules (no repeated predicates) age(X, ” 19 -25”) occupation(X, “student”) buys(X, “coke”) – hybrid-dimension association rules (repeated predicates) age(X, ” 19 -25”) buys(X, “popcorn”) buys(X, “coke”) • Categorical (Nominal) Attributes – finite number of possible values, no ordering among values • Quantitative Attributes – numeric, implicit ordering among values
Exercise 5 min-sup = 20% min-conf = 80%
Research Topics • • • Quantitative Association Rules – buys (bread, 5) ® buys (milk, 3) Weighted Association Rules High Utility Association Rules Non-redundant Association Rule Constrained Association Rules Mining Multi-dimensional Association Rules Generalized Association Rules Negative Association Rules Incremental Mining Association Rules Data Stream Association Rule Mining Interactive Mining Association Rules
- Mining complex types of data in data mining
- Association data mining techniques
- Mining multimedia databases
- Apiori algorithm
- Integrating classification and association rule mining
- Association rule mining definition
- Association rule mining definition
- What is association rules
- Data mining concepts and techniques slides
- Data mining concepts and techniques
- Data mining concepts and techniques slides
- Data mining concepts and techniques slides
- Binning in data mining
- Classification alternative techniques in data mining
- Association rules in data mining
- Association rules in data mining
- Association analysis advanced concepts
- Database vs data mining
- Strip mining vs open pit mining
- Chapter 13 mineral resources and mining
- Difference between strip mining and open pit mining
- Difference between text mining and web mining
- Data reduction in data mining
- Data mining in data warehouse
- What is missing data in data mining
- Data reduction in data mining
- Data reduction in data mining
- Data reduction in data mining