Overview of SQL Server Data Mining CSD 305
- Slides: 15
Overview of SQL Server Data Mining CSD 305 Advanced Databases
Data Mining Definition • Mining • Act of excavation in the earth from which ore or minerals can be extracted • Act of excavation in the data from which patterns can be extracted • Alternative name: Knowledge discovery in databases (KDD) • Multiple disciplines: database, statistics, artificial intelligence • Fast maturing technology • Unlimited applicability CSD 305 Advanced Databases • Data Mining 2
Data Mining Process Train the model Test the model Training Data Mining Management System (DMMS) Test Data Mining Model Prediction using the model Prediction Input Data CSD 305 Advanced Databases Define a model 3
CSD 305 Advanced Databases Data Mining Tasks 4
Data Mining Problems Classification (prediction) • Is this student going to go to a college? • Similar questions: • • • Is this a spam email? (spam filtering) How good/bad is your credit? (credit scoring) Recognition of hand-written letters (pen recognition) What is this gene like? (bioinformatics) Does this person behave like a terrorist? (TIA) CSD 305 Advanced Databases • Based on Gender, Parent. Income, Parent. Encouragement, IQ, etc. • E. g. , if Parent. Encouragement=Yes and IQ>100, College=Yes è Classification (prediction) 5
Decision Tree Attend College 55% Yes 45% No Attend College 79% Yes 21% No Encouragement = Encouraged Attend College 94% Yes 6% No Attend College 35% Yes 65% No Encouragement = Not Encouraged Attend College 69% Yes 31% No CSD 305 Advanced Databases IQ <= 100 IQ > 100 6
Data Mining Problems Regression (prediction) • Based on Hobby, Marital. Status, Number. Of. Children, Income, House. Ownership, Number. Of. Cars, … • E. g. , If Marital. Status=Yes, Age = 20+4*Number. Of. Children+0. 0001*Income+… Regression (prediction) • Similar questions: • What’s the sales amount of ice cream next month? (sales prediction) • What’s the stock price of MSFT next week? (stock prediction) • What’s the income of a customer? (marketing) • What’s the life-time of a software bug? (bug tracking) CSD 305 Advanced Databases • What is the age of a person? 7
Data Mining Problems Segmentation (Clustering) • Who are my Web visitors? • Similar questions: • Identify groups of genes (bioinformatics) • Identify groups of locations of Cholera incidents in London (spatial data mining) • Identify group of customers in merchants (Amazon, E-Bay, MSN, Wal. Mart etc) (target marketing) • Identify groups of documents. (text categorisation) CSD 305 Advanced Databases • Identify similar groups based on demographics, visiting patterns • E. g. , Daily news readers, email users, shoppers, short-stayers, etc è Segmentation (clustering) 8
Data Mining Problems Association Analysis (recommendation, market analysis) • Based on previous purchases (shopping cart) • E. g. , If a digital camera is purchased, flash memory, battery, printer are also purchased. èAssociation Analysis (recommendation, market basket analysis) • Similar questions: • What products to recommend in on-line stores such as Amazon. com. • What items should be displayed together in merchant. • What genes appear together in toxic mushrooms. CSD 305 Advanced Databases • What other products are purchased together with a digital camera? 9
Data Mining Problems Anomaly detection (outlier detection) • Predict likelihood of the network packet pattern è Anomaly detection (outlier detection) • Similar questions: • Are the hospital lab results normal (Adverse drug effect detection) • Is this credit transaction fraudulent? (fraud detection) • Does this person behave unusually, maybe worth highlevel of security clearance? (TIA) CSD 305 Advanced Databases • Could this network packet be from a virus attack? 10
• Classification • Regression • Segmentation • Association Analysis • Anomaly detection • Sequence Analysis • Time-series Analysis • Text categorization • Others CSD 305 Advanced Databases Data Mining Tasks - Summary 11
• Decision Trees • Naïve Bayesian • Clustering • Sequence Clustering • Association Rules • Neural Network • Time Series • Support Vector Machines • …. CSD 305 Advanced Databases Data Mining Algorithms 12
√ √ √ √ √ or k et w eu N √ √ √ Classification Regression Segmentaion Assoc. Analysis Anomaly Detect. Seq. Analysis √ √ - first choice ra l. N ia tio oc ss A e C Se lu rie s st er n ru in g √ √ √ Ti m st lu C √ √ Se q. er in ay B ve aï N √ g es ee s Tr is io n ec D √ √ le s Data Mining Algorithms √ - second choice CSD 305 Advanced Databases Time series 13
Data Mining Vendors • SAS (Analytics) • http: //www. sas. com/technologies/analytics/index. html • IBM (DB 2 Info. Sphere Warehouse) • Oracle (ODM option to Oracle 11 g) • • • http: //www. oracle. com/technetwork/database/options/odm/index. html SPSS (Clementine) Insightsful (Insightful Miner) KXEN (Analytic Framework) Prudsys (Discoverer and its family) Microsoft (SQL Server 2012) Angoss (Knowledge. Server and its family) DBMiner (DBMiner) … and many others CSD 305 Advanced Databases • http: //www 01. ibm. com/software/data/infosphere/warehouse/mining. html 14
• Pang-Ning Tan, Michael Steinbach, Vipin Kumar “Introduction to Data Mining” Pearson Education, 2006 • T. Marakas “Modern Data Warehousing, Mining and Visualization: Core Concepts” Prentice Hall 2003 • http: //msdn. microsoft. com/enus/library/ms 132058. aspx • Data mining extensions to SQL server • http: //troels. arvin. dk/db/rdbms/ • Comparison of different SQL implementations • Also see the wikibook here http: //en. wikibooks. org/wiki/SQL_dialects_reference CSD 305 Advanced Databases References 15
- Mining complex types of data
- Trajectory data mining an overview
- Mining multimedia databases in data mining
- Azure secure enclave
- Parallel data warehouse sql server 2012
- Master data management sql server 2016
- Sql server master data services
- Polybase data virtualization
- Varbinary in sql server
- Sql server change data capture vs temporal tables
- Cdc sql server 2014
- Sql server polybase
- Strip mining vs open pit mining
- Mineral resources and mining chapter 13
- Difference between strip mining and open pit mining
- Difference between text mining and web mining