Introduction to Data Mining Rafal Lukawiecki Strategic Consultant

  • Slides: 33
Download presentation
Introduction to Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli. co. uk

Introduction to Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli. co. uk

Objectives • • Overview Data Mining Introduce typical applications and scenarios Explain some DM

Objectives • • Overview Data Mining Introduce typical applications and scenarios Explain some DM concepts Review wider product platform This seminar is partly based on “Data Mining” book by Zhao. Hui Tang and Jamie Mac. Lennan, and also on Jamie’s presentations. Thank you to Jamie and to Donald Farmer for helping me in preparing this session. Thank you to Roni Karassik for a slide. Thank you to Mike Tsalidis, Olga Londer, and Marin Bezic for all the support. Thank you to Maciej Pilecki for assistance with demos. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. © 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE. 2

Before We Dive In. . . • To help me select the most suitable

Before We Dive In. . . • To help me select the most suitable examples and demonstrations I would like to ask you about your background • Who do you indentify yourself with: • IT Professional, • Database Professional, • Software/System Developer? 3

The Essence of Data Mining as Part of Business Intelligence 4

The Essence of Data Mining as Part of Business Intelligence 4

Business Intelligence Improving Business Insight “A broad category of applications and technologies for gathering,

Business Intelligence Improving Business Insight “A broad category of applications and technologies for gathering, storing, analyzing, sharing and providing access to data to help enterprise users make better business decisions. ” – Gartner 5

Relationships And Acronyms. . . Data Mining (DM) Knowledge Discovery in Databases (KDD) Business

Relationships And Acronyms. . . Data Mining (DM) Knowledge Discovery in Databases (KDD) Business Intelligence (BI) 6

Data Mining • Technologies for analysis of data and discovery of (very) hidden patterns

Data Mining • Technologies for analysis of data and discovery of (very) hidden patterns • Uses a combination of statistics, probability analysis and database technologies • Fairly young (<20 years old) but clever algorithms developed through database research 7

What does Data Mining Do? Explores Your Data Finds Patterns Performs Predictions 8

What does Data Mining Do? Explores Your Data Finds Patterns Performs Predictions 8

DM and BI • BI is geared at an end user, such as a

DM and BI • BI is geared at an end user, such as a business owner, knowledge worker etc. • DM is an IT technology generally geared towards a more advanced user – today • By the way: who is qualified to use DM today? 9

DM Past and Present • Traditional approaches from Microsoft’s competitors are for DM experts:

DM Past and Present • Traditional approaches from Microsoft’s competitors are for DM experts: “White-coat Ph. D statisticians” • DM tools also fairly expensive • Microsoft’s “full” approach is designed for those with some database skills • Tools similar to T-SQL and Management Studio • DM built into Microsoft SQL Server 2005 and 2008 at no extra cost • DM “easy” is geared at any Excel-aware user 10

DM Enables Predictive Analysis Role of Software Data mining Proactive Predictive Analysis Interactive OLAP

DM Enables Predictive Analysis Role of Software Data mining Proactive Predictive Analysis Interactive OLAP Ad-hoc reporting Passive Canned reporting Presentation Exploration Discovery Business Insight 11

Application and Scenarios 12

Application and Scenarios 12

Value of Predictive Analysis Typical Applications Seek Profitable Customers Correct Data During ETL Detect

Value of Predictive Analysis Typical Applications Seek Profitable Customers Correct Data During ETL Detect and Prevent Fraud Understand Customer Needs Predictive Analysis Build Effective Marketing Campaigns Anticipate Customer Churn Predict Sales & Inventory 13

Data Mining Process CRISP-DM Business Understanding Data Understanding “Doing Data Mining” Data Preparation Data

Data Mining Process CRISP-DM Business Understanding Data Understanding “Doing Data Mining” Data Preparation Data Deployment Modeling “Putting Data Mining to Work” Evaluation www. crisp-dm. org 14

Customer Profitability • Typically, you will: 1. Segment or classify customers in a relevant

Customer Profitability • Typically, you will: 1. Segment or classify customers in a relevant way • Clustering 2. Find a relationship between profit and customer characteristics • Decision Tree 3. Understand customer preferences • Association Rules 4. Study customer behaviour • Sequence Clustering and 1. Predict profitability of potential new customers 15

Predict Sales and Inventory • You may: 1. Structure the sales or inventory data

Predict Sales and Inventory • You may: 1. Structure the sales or inventory data as a time series • Perhaps from a Data Warehouse 2. Forecast future sales and needs • Time Series Regression and Prediction 16

Build Effective Marketing Campaigns • You would: 1. Segment your existing customers • Clustering

Build Effective Marketing Campaigns • You would: 1. Segment your existing customers • Clustering and Decision Trees 2. Study what makes them respond to your campaigns • Decision Tree, Naive Bayes, Clustering, Neural Network 3. Experiment with a campaign by focusing it • Lift Charts 4. Run the campaign • Predict recipients 5. Review your strategy as you get response • Update your models 17

Detect and Prevent Fraud • You could: 1. Build a risk model for existing

Detect and Prevent Fraud • You could: 1. Build a risk model for existing customers or transactions • Decision Trees, Clustering, Neural Networks 2. Assess risk of a new transaction • • Predict risk and its probability using the model Or 1. Model transaction sequences • Sequence Clustering 2. Find unusual ones (outliers) • Mine the mining model – neural networks, trees, clustering 3. Assess new events as they happen • Predicting by means of the metamodel 18

New Opportunity: Intelligent Applications • Examples of Intelligent Applications: • Business Process Validation –

New Opportunity: Intelligent Applications • Examples of Intelligent Applications: • Business Process Validation – early detection of failure • Adaptive User Interface based on past behaviour • Input Validation, based on accepted data, not on fixed rules • Also known as Predictive Programming 19

Technology Platform 20

Technology Platform 20

 • Delivery through Microsoft Office • Enterprise Grade “-bilities” • Rich and Innovative

• Delivery through Microsoft Office • Enterprise Grade “-bilities” • Rich and Innovative Algorithms • Native Reporting Integration • In-Flight Mining during Data Integration Extensible • Comprehensive Development Environment Integrated Complete SQL Server Predictive Analysis • Predictive Programming • Custom Algorithms and Visualizations • Insightful Analysis and Exploration • Predictive KPIs 21

Better Strategy Execution With BI Microsoft Performance Point Server Monitor What happened? What is

Better Strategy Execution With BI Microsoft Performance Point Server Monitor What happened? What is happening? Analyze Why? Strategy Plan What will happen? What do I want to happen? Continuous business improvement, not just an annual exercise 22

Microsoft DM Competitors • SAS, largest market share of DM, specialised product for traditional

Microsoft DM Competitors • SAS, largest market share of DM, specialised product for traditional experts • SPSS (Clementine), strength in statistical analysis • IBM (Intelligent Miner) tied to DB 2, interoperates with Microsoft through PMML • Oracle (10 g), supports Java APIs • Angoss (Knowledge. STUDIO), result visualisation, works with SQL Server • KXEN, supports OLAP and Excel 23

DM Technologies in SQL Server 2005 • Strong, patented algorithms from Microsoft Research labs

DM Technologies in SQL Server 2005 • Strong, patented algorithms from Microsoft Research labs • Interoperability • PMML (Predictive Model Markup Language) for SAS, SPSS, IBM and Oracle • Multiple tools: • • Business Intelligence Development Studio (BIDS) Data Mining Extensions for Excel (and more) DMX and OLE DB for Data Mining XML for Analysis (XMLA) 24

What is New in SQL Server 2008? Data Mining Enhancements • In addition to

What is New in SQL Server 2008? Data Mining Enhancements • In addition to other new aspects of SQL Server: • Enhanced Mining Structures • Easier to prepare and test your models • Models allow for cross-validation • Filtering • Algorithm Updates • Improved Time Series algorithm combining best of ARIMA and ARTXP • “What-If” analysis • Microsoft Data Mining Framework • Supplements CRISP-DM 25

DM Add-Ins for Microsoft Office 2007 efine Data dentify Task et Results 26

DM Add-Ins for Microsoft Office 2007 efine Data dentify Task et Results 26

Demo Using Data Mining Add-in Table Tools for Microsoft Excel 2007

Demo Using Data Mining Add-in Table Tools for Microsoft Excel 2007

Conclusions 28

Conclusions 28

ABS-CBN Interactive (ABSi) Subsidiary of the largest integrated media and entertainment company in the

ABS-CBN Interactive (ABSi) Subsidiary of the largest integrated media and entertainment company in the Philippines Wireless Services Firm Doubles Response Rates with SQL Server 2005 Data Mining Challenge Benefit • Selling custom ring Solution tones and other • More accurate and downloadable content personalized service for mobile phone users recommendations to requires staying in customers tune with the market. • Doubling response • ABSi deployed • Searching rates from marketing Microsoft® SQL transactional data for campaigns Server™ 2005 to use hints on what to offer • Ad hoc reporting in its data mining feature users in cross-selling minutes, not days to determine product value-added mobile • Eight times faster data recommendations. services took days and mining process didn’t provide • Faster data mining customer-specific prediction “Our management is very impressed that we could double our response rate through our recommendations. SQL Server 2005 data mining … managers of other services ask us to provide the same magic for them—which is what we will do with the full project rollout” - Grace Cunanan, Technical Specialist, ABS-CBN Interactive 29

Clalit Health Services Data Mining Helps Clalit Preserve Health and Save Lives Provides health

Clalit Health Services Data Mining Helps Clalit Preserve Health and Save Lives Provides health care for 3. 7 million insured members, representing about 60 percent of Israel’s population Solution Challenge • Identify which members would most benefit from proactive intervention to prevent health deterioration Benefit • Use sociodemographic and medical records to generate a predictive score, identifying elder members with highest risk for health deterioration • A chance to preserve life and enhance life quality • Reduced health care costs • Tightly integrated solution • Once identified, physicians can try to involve these patients in treatmentthat plans to data mining model predicts are at risk of “Providing physicians with proactive a list of patients the prevent health deterioration over the next year, gives them the opportunity to intervene, and prevent what has been predicted. ” - Mazal Tuchler, Data Warehouse Manager , Clalit Health Services 30

More Data Mining Customers. 8 TB SS 2005 DW for Ring-Tone Marketing Uses Relational,

More Data Mining Customers. 8 TB SS 2005 DW for Ring-Tone Marketing Uses Relational, OLAP and Data Mining 3 TB end-to-end BI decision support system Oracle competitive win End-to end DW on SQL Server, including OLAP Extensive use of Data Mining Decision Trees 1. 2 TB, 20 billion records Large Brazilian Grocery Chain. 8 TB DW at main TV network in Italy Increased viewership by understanding trends. 5 TB DW at US Cable company End to end BI, Analysis and Reporting 31

Summary • Data Mining is a powerful technology still undiscovered by many IT and

Summary • Data Mining is a powerful technology still undiscovered by many IT and database professionals • Turns data into intelligence • SQL Server 2005 and 2008 Analysis Services have been created with you in mind • Let’s mine for valuable gems of knowledge in our databases! 32

© 2007 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein

© 2007 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. © 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE. 33