Overview of Data Mining The Knowledge Discovery Process

  • Slides: 15
Download presentation
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher De. Paul University

Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher De. Paul University

2019: What Happens in An Internet Minute 2

2019: What Happens in An Internet Minute 2

From Data to Wisdom i Data 4 The raw material of information Wisdom i

From Data to Wisdom i Data 4 The raw material of information Wisdom i Information 4 Data organized and presented by someone Knowledge i Knowledge 4 Information read, heard or seen and understood and integrated Information Data i Wisdom 4 Distilled knowledge and understanding which can lead to decisions The Information Hierarchy 3

Why Data Mining / Machine Learning i What want to: 4 Extract interesting and

Why Data Mining / Machine Learning i What want to: 4 Extract interesting and useful knowledge from the data 4 Find rules, regularities, irregularities, patterns, constraints 4 Predict future outcomes based on past observations 4 hopefully, this will help us better compete in business, do research, learn concepts, make money, etc. i Data Mining: A Definition The non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data in large data repositories 4 Non-trivial: obvious knowledge is not useful 4 implicit: hidden difficult to observe knowledge 4 previously unknown 4 potentially useful: actionable; easy to understand 4

The Knowledge Discovery Process 4 DM is only part of the KDD process 4

The Knowledge Discovery Process 4 DM is only part of the KDD process 4 DM phase generally employs machine learning and statistical techniques - The KDD Process 5

Types of Knowledge Discovery i Two kinds of knowledge discovery: directed and undirected i

Types of Knowledge Discovery i Two kinds of knowledge discovery: directed and undirected i Directed Knowledge Discovery 4 Purpose: Explain value of some field in terms of all the others (goal-oriented) 4 Method: select the target field based on some hypothesis about the data; ask the algorithm to tell us how to predict or classify new instances 4 Examples: hwhat products show increased sale when cream cheese is discounted hwhich banner ad to use on a web page for a given user coming to the site i Undirected Knowledge Discovery 4 Purpose: Find patterns in the data that may be interesting (no target field) 4 Method: clustering, affinity grouping 4 Examples: hwhich products in the catalog often sell together hmarket segmentation (find groups of customers/users with similar characteristics or behavioral patterns) 6

From Data Mining to Data Science 7

From Data Mining to Data Science 7

What Kinds of Data? i Database-oriented data sets and applications 4 Relational database, data

What Kinds of Data? i Database-oriented data sets and applications 4 Relational database, data warehouse, transactional database 4 Object-relational databases, Heterogeneous databases and legacy databases i Advanced data sets and advanced applications 4 Data streams and sensor data 4 Time-series data, temporal data, sequence data (incl. bio-sequences) 4 Structure data, graphs, social networks and information networks 4 Spatial data and spatiotemporal data 4 Multimedia database 4 Text data and other semi-structured data 4 The World-Wide Web 8

What Kind of Data? i. Structured Databases 4 relational, object-relational, etc. 4 can use

What Kind of Data? i. Structured Databases 4 relational, object-relational, etc. 4 can use SQL to perform parts of the process e. g. , SELECT count(*) FROM Items WHERE type=video GROUP BY category 9

What Kind of Data? i Flat Files 4 most common data source 4 can

What Kind of Data? i Flat Files 4 most common data source 4 can be text (or HTML) or binary 4 may contain transactions, statistical data, measurements, etc. i Transactional databases 4 set of records each with a transaction id, time stamp, and a set of items 4 may have an associated “description” file for the items 4 typical source of data used in market basket analysis 10

Data Mining: What Kind of Data? i Other Types of Databases 4 legacy databases

Data Mining: What Kind of Data? i Other Types of Databases 4 legacy databases 4 multimedia databases (usually very high-dimensional) 4 spatial databases (containing geographical information, such as maps, or satellite imaging data, etc. ) 4 Time Series Temporal Data (time dependent information such as stock market data; usually very dynamic) i World Wide Web 4 basically a large, heterogeneous, distributed database 4 need for new or additional tools and techniques hinformation retrieval, filtering and extraction hagents to assist in browsing and filtering h. Web content, usage, and structure (linkage) mining tools 4 The “social Web” h User generated meta-data, social networks, shared resources, etc. 11

What Can Data Mining Do i Many Data Mining/Machine Learning Tasks 4 often inter-related

What Can Data Mining Do i Many Data Mining/Machine Learning Tasks 4 often inter-related 4 often need to try different techniques/algorithms for each task 4 each tasks may require different types of knowledge discovery i What are some of data mining tasks 4 Classification 4 Prediction 4 Clustering 4 Affinity Grouping / Association discovery 4 Sequence Analysis 4 Characterization 4 Discrimination 12

Some Applications of Data mining i Business data analysis and decision support 4 Marketing

Some Applications of Data mining i Business data analysis and decision support 4 Marketing focalization h. Recognizing specific market segments that respond to particular characteristics h. Return on mailing campaign (target marketing) 4 Customer Profiling h. Segmentation of customer for marketing strategies and/or product offerings h. Customer behavior understanding h. Customer retention and loyalty h. Mass customization / personalization 13

Some Applications of Data mining i Business data analysis and decision support (cont. )

Some Applications of Data mining i Business data analysis and decision support (cont. ) 4 Market analysis and management h. Provide summary information for decision-making h. Market basket analysis, cross selling, market segmentation. h. Resource planning 4 Risk analysis and management h"What if" analysis h. Forecasting h. Pricing analysis, competitive analysis h. Time-series analysis (Ex. stock market) 14

Some Applications of Data mining i Fraud detection 4 Detecting telephone fraud: h Telephone

Some Applications of Data mining i Fraud detection 4 Detecting telephone fraud: h Telephone call model: destination of the call, duration, time of day or week h Analyze patterns that deviate from an expected norm h British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud scheme 4 Detection of credit-card fraud 4 Detecting suspicious money transactions (money laundering) i Text mining: 4 Message filtering (e-mail, newsgroups, etc. ) 4 Newspaper articles analysis 4 Text and document categorization i Personalization and Recommendation 4 Learn from user/customer preference and predict their future intersts 15