Overview of Data Mining Methods Data mining techniques
- Slides: 20
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages
結束 Contents Reviews data mining tools Compares data mining perspectives Discusses data mining functions Presents four sets of data used to demonstrate tools in subsequent chapters Shows the Enterprise Miner structure for data mining analysis in the appendix 4 -2
結束 Data mining applications Automobile insurance company: Fraud detection Business applications: loan evaluation, customer segmentation, employee evaluation… Data mining tools categorized by the tasks of classification, estimation, prediction, clustering, and summarization. Classification, estimation, prediction are predictive, while clustering and summarization are descriptive. 4 -3
結束 History Statistics AI: Øgenetic algorithms, neural networks üanalogies with biology Ømemory-based reasoning Ølink analysis from graph theory See table. 4. 1 4 -4
結束 Data mining perspectives Methods can be viewed from different perspectives, data mining methods include: Ø Cluster analysis (Chapter 5) Ø Regression of various forms (best fit methods, chapter 6) Ø Discriminant analysis (use of regression for classification, chapter 6) Ø Line fitting through the operations research tool of multiple objective linear programming (Chapter 9) AI: Ø ANN (chapter 7) Ø Rule induction (decision trees, chapter 8) Ø Genetic algorithms (supplement) See page 55 for more descriptions 4 -5
結束 Techniques Statistical Ø Market-Basket Analysis - find groups of items Ø Memory-Based Reasoning- case based Ø Cluster Detection - undirected (quantitative) Artificial Intelligence Ø Link Analysis - MCI’s Friends & Family Ø Decision Trees, Rule Induction - production rule Ø Neural Networks - automatic pattern detection Ø Genetic Algorithms - keep best parameters 4 -6
結束 Models Regression: Y = a + b. X Classification: assign new record to class Predictive: assign value to new record Clustering: groups for data Time-series: assign future value Links: patterns in data 4 -7
結束 Fitting Underfitting: not enough detail Øleave out important variables Overfitting: too much detail Ømemorizes training set, but doesn’t help with new data üdata set too small üredundancy in data 4 -8
結束 Comparison of Features Rules Neural Net Case. Base Genetic Noisy data Good Very good Missing data Good Very good Poor Good Different types Good Numerical Very good Transform Accuracy High Very high High Explanation Very good Poor Very good Good Integration Good Very good Ease Easy Difficult Large sets 4 -9
結束 Data Mining Functions Classification Ø Identify categories in data Prediction Ø Formula to predict future observations Association Ø Rules using relationships among entities Detection Ø Anomalies (unusual) & irregularities (fraud detection) 4 -10
結束 Financial Applications Technique Application Problem Type Neural net Forecast stock price Prediction NN, Rule Forecast bankruptcy Fraud detection Prediction Detection NN, Case Forecast interest rate Prediction NN, visual Late loan detection Detection Rule Credit assessment Risk classification Prediction Classification Rule, Case Corporate bond rate (公司債) Prediction 4 -11
結束 Telecom Applications Technique Application Neural net, Forecast network Rule induction behavior. Problem Type Prediction Churn Rule induction Fraud detection Classification Detection Case based Classification Call tracking 4 -12
結束 Marketing Applications Technique Rule induction, visual Rule induction, genetic, visual Case based Application Market segment Cross-selling Lifestyle analysis Performance analysis. Reaction to promotion Online sales support Problem Type Classification Association Prediction Classification 4 -13
結束 Web Applications Technique Rule induction, Visualization Application Problem Type Classification, User browsing similarity analysis. Association Rule-based heuristics Web page content similarity Association 4 -14
結束 Other Applications Technique Application Problem Type Neural net Software cost Detection Neural net, rule induction Litigation assessment Prediction Rule induction Insurance fraud Healthcare except. Detection Case based Insurance claim Software quality Genetic algorithm Budget spending Prediction Classification 4 -15
結束 Data Sets Loan Applications Ø classification Job Applications Ø classification Insurance Fraud Ø detection Expenditure Data Ø prediction 4 -16
結束 Loan Data 650 observations OUTCOMES (binary): Ø On-time Ø Late (default) cost of error: $300 cost of error: $2, 000 Variables Ø Age, Income, Assets, Debts, Want, Credit üCredit ordinal Ø Transform: Assets, Debts, & Want →Risk 4 -17
結束 Job Application Data 500 observations OUTCOMES (ordinal): Ø Unacceptable Ø Minimal Ø Acceptable Ø Excellent Variables Ø Age, State, Degree, Major, Experience üState nominal; degree & major ordinal üState is superfluous 4 -18
結束 Insurance Claim Data 5000 observations OUTCOMES (binary): Ø OK Ø Fraudulent cost of error $500 cost of error $2, 500 Variables Ø Age, Gender, Claim, Tickets, Prior claims, Attorney üGender & attorney nominal, tickets & prior claims categorical 4 -19
結束 Expenditure Data 10, 000 observations OUTCOMES: Ø Could predict response in a number of categories Ø Others Variables: Ø Age, Gender, Marital, Dependents, Income, Job years, Town years, Education years, Drivers license, Own home, Number of credit cards Ø Churn, proportion of income spent on seven categories 4 -20
- Mining complex types of data in data mining
- Trajectory data mining an overview
- Methods to avoid false discoveries in data mining
- Binning methods in data mining
- Mining multimedia databases
- Data mining concepts and techniques slides
- Data mining concepts and techniques
- Data mining concepts and techniques slides
- Data mining concepts and techniques slides
- Association data mining techniques
- Classification alternative techniques in data mining
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Direct wax pattern technique
- Surface mining methods
- Frequent itemset mining methods
- Strip mining vs open pit mining
- Chapter 13 mineral resources and mining worksheet answers
- Difference between strip mining and open pit mining
- Difference between text mining and web mining