Fast Algorithms for Mining Association Rules Rakesh Agrawal
![Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Modified from slides by: Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Modified from slides by:](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-1.jpg)
![Origin of Problem • Basket data: new possibility of customized data-driven strategies for retail Origin of Problem • Basket data: new possibility of customized data-driven strategies for retail](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-2.jpg)
![Usage of Data Mining (general) • Clustering, predictive modeling, dependency modeling, data summarization, change Usage of Data Mining (general) • Clustering, predictive modeling, dependency modeling, data summarization, change](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-3.jpg)
![Association Rule • Formal Definition • • , where , and Confidence c, c% Association Rule • Formal Definition • • , where , and Confidence c, c%](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-4.jpg)
![Mining Association Rules q The problem of finding association rules falls within the purview Mining Association Rules q The problem of finding association rules falls within the purview](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-5.jpg)
![Problem Decomposition l Find all sets of items (itemseis) that have transaction support above Problem Decomposition l Find all sets of items (itemseis) that have transaction support above](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-6.jpg)
![Discovering Large Itemsets Intuition: any subset of a large itemset must be large. Algorithms Discovering Large Itemsets Intuition: any subset of a large itemset must be large. Algorithms](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-7.jpg)
![Algorithm Apriori 8 Algorithm Apriori 8](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-8.jpg)
![Itemset Generation • The apriori-gen function has two steps • Join • Prune • Itemset Generation • The apriori-gen function has two steps • Join • Prune •](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-9.jpg)
![Example for Apriori 10 Example for Apriori 10](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-10.jpg)
![Example for Apriori (cont’) 11 Example for Apriori (cont’) 11](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-11.jpg)
![Apriori TID • Candidate generation is same as Apriori, but DB is used for Apriori TID • Candidate generation is same as Apriori, but DB is used for](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-12.jpg)
![Performance • Average size of transactions 5~20; • Average size of maximal potentially large Performance • Average size of transactions 5~20; • Average size of maximal potentially large](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-13.jpg)
![Results 14 Results 14](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-14.jpg)
![Smaller number of candidates 15 Smaller number of candidates 15](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-15.jpg)
![Discussion: • In year 1994, the author mentioned the dataset they used in scale-up Discussion: • In year 1994, the author mentioned the dataset they used in scale-up](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-16.jpg)
![Improving Apriori in efficiency • Apriori. Hybrid: • • • Use Apriori in initial Improving Apriori in efficiency • Apriori. Hybrid: • • • Use Apriori in initial](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-17.jpg)
![Apriori. Hybrid 18 Apriori. Hybrid 18](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-18.jpg)
![Later Work • Parallel versions • Quantitative association rules E. g. , "10% of Later Work • Parallel versions • Quantitative association rules E. g. , "10% of](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-19.jpg)
![Conclusion • Two new algorithms: Apriori and Apriori. Tid are discussed. • These algorithms Conclusion • Two new algorithms: Apriori and Apriori. Tid are discussed. • These algorithms](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-20.jpg)
![Discussion • This paper has spun off more similar algorithm in the database world Discussion • This paper has spun off more similar algorithm in the database world](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-21.jpg)
- Slides: 21
![Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Modified from slides by Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Modified from slides by:](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-1.jpg)
Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Modified from slides by: Dan Li Presenter: Jimmy Jiang Discussion Lead: Leo Li 1
![Origin of Problem Basket data new possibility of customized datadriven strategies for retail Origin of Problem • Basket data: new possibility of customized data-driven strategies for retail](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-2.jpg)
Origin of Problem • Basket data: new possibility of customized data-driven strategies for retail companies • Simple example: beer and diapers • Famous case for market basket analysis • If you buy diapers, you tend to buy beer 2
![Usage of Data Mining general Clustering predictive modeling dependency modeling data summarization change Usage of Data Mining (general) • Clustering, predictive modeling, dependency modeling, data summarization, change](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-3.jpg)
Usage of Data Mining (general) • Clustering, predictive modeling, dependency modeling, data summarization, change and deviation. • Association rules in dependency modeling • Applications: market basket analysis, direct/interactive marketing, fraud detection, science, sports stats , etc 3 Some sources from: http: //redbook. cs. berkeley. edu/redbook 3/lec 29. html
![Association Rule Formal Definition where and Confidence c c Association Rule • Formal Definition • • , where , and Confidence c, c%](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-4.jpg)
Association Rule • Formal Definition • • , where , and Confidence c, c% of transactions that contain X also contain Y Support s, s% of all transactions contain Transaction: set of items, where I is the total set of items l Example ¡buys(x, “computer”) →buys(x, “financial management software”) [0. 5%, 60%] 4
![Mining Association Rules q The problem of finding association rules falls within the purview Mining Association Rules q The problem of finding association rules falls within the purview](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-5.jpg)
Mining Association Rules q The problem of finding association rules falls within the purview of database mining. • Eg 1: Find proportion (confidence) of transactions • that purchase diapers also purchase beer. Eg 2: Find proportion (support) of all transactions purchase diaper and/or beer (among all transactions). Finding association rules is valuable for • Cross-marketing • Catalog design • Add-on sales • Store layout and so on 5
![Problem Decomposition l Find all sets of items itemseis that have transaction support above Problem Decomposition l Find all sets of items (itemseis) that have transaction support above](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-6.jpg)
Problem Decomposition l Find all sets of items (itemseis) that have transaction support above minimum support: Apriori and Apriori. Tid l Use the large itemsets to generate the desired rules: not discussed 6
![Discovering Large Itemsets Intuition any subset of a large itemset must be large Algorithms Discovering Large Itemsets Intuition: any subset of a large itemset must be large. Algorithms](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-7.jpg)
Discovering Large Itemsets Intuition: any subset of a large itemset must be large. Algorithms for discovering large itemsets make multiple passes over the data. l In the first pass, determine which individual item is large. l In each sequent pass: ¡ Previous large itemsets (seed sets) are used to generate candidate itemsets. ¡ Count actual support for the candidate itemsets. ¡ Determine which are the real large itemsets, pass to next pass l This process continues until no new large itemsets are found. 7
![Algorithm Apriori 8 Algorithm Apriori 8](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-8.jpg)
Algorithm Apriori 8
![Itemset Generation The apriorigen function has two steps Join Prune Itemset Generation • The apriori-gen function has two steps • Join • Prune •](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-9.jpg)
Itemset Generation • The apriori-gen function has two steps • Join • Prune • Efficiently Implementing subset functions • The hash tree 9
![Example for Apriori 10 Example for Apriori 10](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-10.jpg)
Example for Apriori 10
![Example for Apriori cont 11 Example for Apriori (cont’) 11](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-11.jpg)
Example for Apriori (cont’) 11
![Apriori TID Candidate generation is same as Apriori but DB is used for Apriori TID • Candidate generation is same as Apriori, but DB is used for](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-12.jpg)
Apriori TID • Candidate generation is same as Apriori, but DB is used for counting support only in the first pass. • More memory needed: storage set in memory containing frequent sets per transaction 12
![Performance Average size of transactions 520 Average size of maximal potentially large Performance • Average size of transactions 5~20; • Average size of maximal potentially large](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-13.jpg)
Performance • Average size of transactions 5~20; • Average size of maximal potentially large itemsets 2~6 • Dataset sizes 2. 4~8. 4 MB (on an IBM RS/SOOO 530 H workstation with a CPU clock rate of 33 MHz, 64 MB of main memory, and running AIX 3. 2. The data resided in the AIX file system and was stored on a 2 GB SCSI 3. 5” drive, with measured sequential throughput of about 2 MB/second. ) 13
![Results 14 Results 14](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-14.jpg)
Results 14
![Smaller number of candidates 15 Smaller number of candidates 15](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-15.jpg)
Smaller number of candidates 15
![Discussion In year 1994 the author mentioned the dataset they used in scaleup Discussion: • In year 1994, the author mentioned the dataset they used in scale-up](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-16.jpg)
Discussion: • In year 1994, the author mentioned the dataset they used in scale-up experiment are not even as large as 1 GB. • Nowadays, this seems to be not sufficient as scale-up (as in previous discussion, Hive and Dremel could handle TB and even PB level data). • Do you think Apriori algorithm is still suitable today? Are there any limitations or issues exposed due to the increased data size? 16
![Improving Apriori in efficiency Apriori Hybrid Use Apriori in initial Improving Apriori in efficiency • Apriori. Hybrid: • • • Use Apriori in initial](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-17.jpg)
Improving Apriori in efficiency • Apriori. Hybrid: • • • Use Apriori in initial passes; (Heuristic) Estimation on the size of C^k; Switch to Apriori. Tid when C^k is expected to fit in memory� 17
![Apriori Hybrid 18 Apriori. Hybrid 18](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-18.jpg)
Apriori. Hybrid 18
![Later Work Parallel versions Quantitative association rules E g 10 of Later Work • Parallel versions • Quantitative association rules E. g. , "10% of](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-19.jpg)
Later Work • Parallel versions • Quantitative association rules E. g. , "10% of married people between age 50 and 60 have at least 2 cars. " • Online association rules 19
![Conclusion Two new algorithms Apriori and Apriori Tid are discussed These algorithms Conclusion • Two new algorithms: Apriori and Apriori. Tid are discussed. • These algorithms](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-20.jpg)
Conclusion • Two new algorithms: Apriori and Apriori. Tid are discussed. • These algorithms outperform AIS and SETM. • Apriori and Apriori. Tid can be combined into Apriori. Hybird. • Apriori. Hybrid matches Apriori and Apriori. Tid when either one wins. 20
![Discussion This paper has spun off more similar algorithm in the database world Discussion • This paper has spun off more similar algorithm in the database world](https://slidetodoc.com/presentation_image/2c81ee04dab4955aaefc836ff3c9fd05/image-21.jpg)
Discussion • This paper has spun off more similar algorithm in the database world than any other data mining algorithm. • Why do you think this is the case? Is it the algorithm? The problem? The approach? Or something else. 21
Aprioti
Fast algorithms for mining association rules
Fast algorithms for mining association rules
Fast algorithms for mining association rules
Ca rakesh agrawal
Association rules in data mining
Association rules in data mining
Association rules in data mining
Association rules in data mining
Process mining algorithms
Virtual web view
Kelvin rodolfo
Uiuc cs 445
Dr rakesh biswas
Dr rakesh sharma ahmedabad
Rakesh kumar ntnu
Spinocerebellar ataxia type 3 symptoms
Rakesh mehta nsdl
Dr rakesh biswas
Dr rakesh biswas
Rakesh heer
Dr deepak agrawal