Practices of Business Intelligence Tamkang University Data Mining
商業智慧實務 Practices of Business Intelligence Tamkang University 商業智慧的資料探勘 (Data Mining for Business Intelligence) 1022 BI 06 MI 4 Wed, 9, 10 (16: 10 -18: 00) (B 113) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系 http: //mail. tku. edu. tw/myday/ 2014 -03 -26 1
課程大綱 (Syllabus) 週次 (Week) 日期 (Date) 內容 (Subject/Topics) 1 103/02/19 商業智慧導論 (Introduction to Business Intelligence) 2 103/02/26 管理決策支援系統與商業智慧 (Management Decision Support System and Business Intelligence) 3 103/03/05 企業績效管理 (Business Performance Management) 4 103/03/12 資料倉儲 (Data Warehousing) 5 103/03/19 商業智慧的資料探勘 (Data Mining for Business Intelligence) 6 103/03/26 商業智慧的資料探勘 (Data Mining for Business Intelligence) 7 103/04/02 教學行政觀摩日 (Off-campus study) 8 103/04/09 資料科學與巨量資料分析 (Data Science and Big Data Analytics) 2
課程大綱 (Syllabus) 週次 日期 內容(Subject/Topics) 9 103/04/16 期中報告 (Midterm Project Presentation) 10 103/04/23 期中考試週 (Midterm Exam) 11 103/04/30 文字探勘與網路探勘 (Text and Web Mining) 12 103/05/07 意見探勘與情感分析 (Opinion Mining and Sentiment Analysis) 13 103/05/14 社會網路分析 (Social Network Analysis) 14 103/05/21 期末報告 (Final Project Presentation) 15 103/05/28 畢業考試週 (Final Exam) 3
A Taxonomy for Data Mining Tasks Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 4
Market Basket Analysis Source: Han & Kamber (2006) 5
Association Rule Mining • Apriori Algorithm Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 6
Basic Concepts: Frequent Patterns and Association Rules Transaction-id Items bought 10 A, B, D 20 A, C, D 30 A, D, E 40 B, E, F 50 B, C, D, E, F Customer buys both Customer buys beer • Itemset X = {x 1, …, xk} • Find all the rules X Y with minimum support and confidence Customer buys diaper – support, s, probability that a transaction contains X Y – confidence, c, conditional probability that a transaction having X also contains Y Let supmin = 50%, confmin = 50% Freq. Pat. : {A: 3, B: 3, D: 4, E: 3, AD: 3} Association rules: A D (60%, 100%) D A (60%, 75%) A D (support = 3/5 = 60%, confidence = 3/3 =100%) D A (support = 3/5 = 60%, confidence = 3/4 = 75%) Source: Han & Kamber (2006) 7
Market basket analysis • Example – Which groups or sets of items are customers likely to purchase on a given trip to the store? • Association Rule – Computer antivirus_software [support = 2%; confidence = 60%] • A support of 2% means that 2% of all the transactions under analysis show that computer and antivirus software purchased together. • A confidence of 60% means that 60% of the customers who purchased a computer also bought the software. Source: Han & Kamber (2006) 8
Association rules • Association rules are considered interesting if they satisfy both – a minimum support threshold and – a minimum confidence threshold. Source: Han & Kamber (2006) 9
Frequent Itemsets, Closed Itemsets, and Association Rules Support (A B) = P(A B) Confidence (A B) = P(B|A) Source: Han & Kamber (2006) 10
Support (A B) = P(A B) Confidence (A B) = P(B|A) • The notation P(A B) indicates the probability that a transaction contains the union of set A and set B – (i. e. , it contains every item in A and in B). • This should not be confused with P(A or B), which indicates the probability that a transaction contains either A or B. Source: Han & Kamber (2006) 11
• itemset – A set of items is referred to as an itemset. • K-itemset – An itemset that contains k items is a k-itemset. • Example: – The set {computer, antivirus software} is a 2 -itemset. Source: Han & Kamber (2006) 12
• If the relative support of an itemset I satisfies a prespecified minimum support threshold, then I is a frequent itemset. – i. e. , the absolute support of I satisfies the corresponding minimum support count threshold • The set of frequent k-itemsets is commonly denoted by LK Source: Han & Kamber (2006) 13
• the confidence of rule A B can be easily derived from the support counts of A and A B. • once the support counts of A, B, and A B are found, it is straightforward to derive the corresponding association rules A B and B A and check whether they are strong. • Thus the problem of mining association rules can be reduced to that of mining frequent itemsets. Source: Han & Kamber (2006) 14
Transactional data for an All. Electronics branch Source: Han & Kamber (2006) 15
Example: Apriori • Let’s look at a concrete example, based on the All. Electronics transaction database, D. • There are nine transactions in this database, that is, |D| = 9. • Apriori algorithm for finding frequent itemsets in D Source: Han & Kamber (2006) 16
Example: Apriori Algorithm Generation of candidate itemsets and frequent itemsets, where the minimum support count is 2. Source: Han & Kamber (2006) 17
Example: Apriori Algorithm C 1 L 1 Source: Han & Kamber (2006) 18
Example: Apriori Algorithm C 2 L 2 Source: Han & Kamber (2006) 19
Example: Apriori Algorithm C 3 L 3 Source: Han & Kamber (2006) 20
The Apriori algorithm for discovering frequent itemsets for mining Boolean association rules. Source: Han & Kamber (2006) 21
Generating Association Rules from Frequent Itemsets Source: Han & Kamber (2006) 22
Example: Generating association rules • frequent itemset l = {I 1, I 2, I 5} • If the minimum confidence threshold is, say, 70%, then only the second, third, and last rules above are output, because these are the only ones generated that are strong. Source: Han & Kamber (2006) 23
關聯分析衡量的機率統計值— Support & Confidence A B C A C D B C D A D E B C E Rule Support Confidence A D C A A C B&C D 2/5 2/5 1/5 2/3 2/4 2/3 1/3 Source: SAS Enterprise Miner Course Notes, 2014, SAS 24
Support & Confidence 高的關聯規則 就一定是有用的規則? Checking Account No Yes No 4, 000 Yes 6, 000 Saving Account 10, 000 Support(SVG CK) = 50%=5, 000/10, 000 Confidence(SVG CK) = 83%=5, 000/6, 000 Expected Confidence(SVG CK) = 85%=8, 500/10, 000 Lift (SVG CK) = Confidence/Expected Confidence = 0. 83/0. 85 < 1 Source: SAS Enterprise Miner Course Notes, 2014, SAS 25
Support (A B) Confidence (A B) Expected Confidence (A B) Lift (A B) 27
Support (A B) = P(A B) A與B 共同出現次數/總交易次數 Count(A&B)/Count(Total) Confidence (A B) = P(B|A) Conf (A B) = Supp (A B)/ Supp (A) A與B 共同出現次數/A出現的次數 Count(A&B)/Count(A) Expected Confidence (A B) = Support(B) Count(B) Lift (A B) = Confidence (A B) / Expected Confidence (A B) Lift (A B) = Supp (A B) / (Supp (A) x Supp (B)) Lift (Correlation) Lift (A B) = Confidence (A B) / Support(B) 28
Lift (A B) • Lift (A B) = Confidence (A B) / Expected Confidence (A B) = Confidence (A B) / Support(B) = (Supp (A&B) / Supp (A)) / Supp(B) = Supp (A&B) / Supp (A) x Supp (B) • Lift 增益值 (提升值) Lift (A B) = 2 表示 A B 這條規則的增益值為 2, 代表已知在買A的前題下又買B的機率, 比直接買B 的機率提升 (增益)了2倍。 29
我的資料適合進行 購物籃分析嗎? D A A A Source: SAS Enterprise Miner Course Notes, 2014, SAS B B 31
個案分析與實作二 (SAS EM 關連分析): Case Study 2 (Association Analysis using SAS EM) Web Site Usage Associations 32
資料欄位說明 • 資料集名稱: webstation. sas 7 bdat ARCHIVE 廣播節目回顧 EXTREF 好站相連 LIVESTREAM 熱門節目收聽 MUSICSTREAM 流行音樂區 NEWS 最新消息 PODCAST 音樂下載 SIMULCAST 同步收聽 WEBSITE 首頁 Source: SAS Enterprise Miner Course Notes, 2014, SAS 35
SAS Enterprise Miner (SAS EM) Case Study • SAS EM 資料匯入 4步驟 – Step 1. 新增專案 (New Project) – Step 2. 新增資料館 (New / Library) – Step 3. 建立資料來源 (Create Data Source) – Step 4. 建立流程圖 (Create Diagram) • SAS EM SEMMA 建模流程 37
Download EM_Data. zip (SAS EM Datasets) http: //mail. tku. edu. tw/myday/teaching/1022/DM/Data/EM_Data. zip http: //mail. tku. edu. tw/myday/teaching. htm 38
Upzip EM_Data. zip to C: DATAEM_Data 39
Upzip EM_Data. zip to C: DATAEM_Data 40
VMware Horizon View Client softcloud. tku. edu. tw SAS Enterprise Miner 41
SAS Enterprise Guide (SAS EG) 42
SAS EG New Project 43
SAS EG Open Data 44
SAS EG Open webstation. sas 7 bdat 45
webstation. sas 7 bdat 46
webstation. sas 7 bdat 47
SAS Enterprise Miner 12. 1 (SAS EM) 48
SAS EM 資料匯入 4步驟 • • Step 1. 新增專案 (New Project) Step 2. 新增資料館 (New / Library) Step 3. 建立資料來源 (Create Data Source) Step 4. 建立流程圖 (Create Diagram) 49
Step 1. 新增專案 (New Project) 50
Step 1. 新增專案 (New Project) 51
Step 1. 新增專案 (New Project) 52
SAS Enterprise Miner (EM_Project 2) 53
Step 2. 新增資料館 (New / Library) 54
Step 2. 新增資料館 (New / Library) 55
Step 2. 新增資料館 (New / Library) 56
Step 2. 新增資料館 (New / Library) 57
Step 2. 新增資料館 (New / Library) 58
Step 3. 建立資料來源 (Create Data Source) 59
Step 3. 建立資料來源 (Create Data Source) 60
Step 3. 建立資料來源 (Create Data Source) 61
Step 3. 建立資料來源 (Create Data Source) 62
Step 3. 建立資料來源 (Create Data Source) 63
Step 3. 建立資料來源 (Create Data Source) Database. Name. Table. Name Library. Name. Table. Name EM_LIB. WEBSTATION 64
Step 3. 建立資料來源 (Create Data Source) 65
Step 3. 建立資料來源 (Create Data Source) 66
Step 3. 建立資料來源 (Create Data Source) 67
Step 3. 建立資料來源 (Create Data Source) 68
Step 3. 建立資料來源 (Create Data Source) 69
Step 3. 建立資料來源 (Create Data Source) 70
Step 3. 建立資料來源 (Create Data Source) 71
Step 3. 建立資料來源 (Create Data Source) Data Source Attribute Role: Transaction 72
Step 3. 建立資料來源 (Create Data Source) 73
Step 3. 建立資料來源 (Create Data Source) 74
Step 4. 建立流程圖 (Create Diagram) 75
Step 4. 建立流程圖 (Create Diagram) 76
Step 4. 建立流程圖 (Create Diagram) 77
SAS Enterprise Miner (SAS EM) Case Study • SAS EM 資料匯入 4步驟 – Step 1. 新增專案 (New Project) – Step 2. 新增資料館 (New / Library) – Step 3. 建立資料來源 (Create Data Source) – Step 4. 建立流程圖 (Create Diagram) • SAS EM SEMMA 建模流程 78
EM_Lib. Webstation 81
樣本資料匯入 (Sample) Edit Variable 82
樣本資料匯入 (Sample) Edit Variable - Explore … 83
樣本資料匯入 (Sample) Edit Variable - Explore … 84
Explore - Association 85
關聯分析 (Association Analysis) 86
關聯分析 (Association Analysis) 87
關聯分析 (Association Analysis) 88
關聯分析 (Association Analysis) 89
關聯分析 (Association Analysis) 90
關聯分析 (Association Analysis) 91
關聯分析 (Association Analysis) 92
關聯分析 (Association Analysis) Support : 1% (Minimum Support = 1%) 93
關聯分析 (Association Analysis) 94
關聯分析 (Association Analysis) 95
關聯分析 (Association Analysis) 檢視/規則/規則表格 (Rules Table) 96
關聯分析 (Association Analysis) Association Rules - 規則表格 (Rules Table) 97
關聯分析 (Association Analysis) Association Rules - 規則表格 (Rules Table) 98
關聯分析 (Association Analysis) 檢視/規則/連結圖形 (Link Graph) 99
關聯分析 (Association Analysis) 連結圖形 (Link Graph) 100
關聯分析 (Association Analysis) Maximum Number of Items: 3000000 101
關聯分析 (Association Analysis) 102
關聯分析 (Association Analysis) Association Rules - 規則表格 (Rules Table) 103
關聯分析 (Association Analysis) 連結圖形 (Link Graph) 104
References • Efraim Turban, Ramesh Sharda, Dursun Delen, Decision Support and Business Intelligence Systems, Ninth Edition, 2011, Pearson. • Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Second Edition, 2006, Elsevier • Jim Georges, Jeff Thompson and Chip Wells, Applied Analytics Using SAS Enterprise Miner, SAS, 2010 • SAS Enterprise Miner Course Notes, 2014, SAS • SAS Enterprise Miner Training Course, 2014, SAS • SAS Enterprise Guide Training Course, 2014, SAS 105
- Slides: 105