Overview of Data Mining The Knowledge Discovery Process

  • Slides: 16
Download presentation
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher De. Paul University

Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher De. Paul University

Why Data Mining? i. The Explosive Growth of Data: from terabytes to petabytes 4

Why Data Mining? i. The Explosive Growth of Data: from terabytes to petabytes 4 Data collection and data availability h. Automated data collection tools, database systems, Web, computerized society 4 Major sources of abundant data h. Business: Web, e-commerce, transactions, stocks, … h. Science: Remote sensing, bioinformatics, scientific simulation, … h. Society and everyone: news, images, video, documents, …. 2

Source: Intel, 2012 3

Source: Intel, 2012 3

From Data to Wisdom i Data 4 The raw material of information Wisdom i

From Data to Wisdom i Data 4 The raw material of information Wisdom i Information 4 Data organized and presented by someone Knowledge i Knowledge 4 Information read, heard or seen and understood and integrated Information Data i Wisdom 4 Distilled knowledge and understanding which can lead to decisions The Information Hierarchy 4

What is Data Mining i What do we need? 4 Extract interesting and useful

What is Data Mining i What do we need? 4 Extract interesting and useful knowledge from the data 4 Find rules, regularities, irregularities, patterns, constraints 4 hopefully, this will help us better compete in business, do research, learn concepts, make money, etc. i Data Mining: A Definition The non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data in large data repositories 4 Non-trivial: obvious knowledge is not useful 4 implicit: hidden difficult to observe knowledge 4 previously unknown 4 potentially useful: actionable; easy to understand 5

The Knowledge Discovery Process i Data Mining v. Knowledge Discovery in Data (KDD) 4

The Knowledge Discovery Process i Data Mining v. Knowledge Discovery in Data (KDD) 4 DM and KDD are often used interchangeably 4 actually, DM is only part of the KDD process - The KDD Process 6

Types of Knowledge Discovery i Two kinds of knowledge discovery: directed and undirected i

Types of Knowledge Discovery i Two kinds of knowledge discovery: directed and undirected i Directed Knowledge Discovery 4 Purpose: Explain value of some field in terms of all the others (goal-oriented) 4 Method: select the target field based on some hypothesis about the data; ask the algorithm to tell us how to predict or classify new instances 4 Examples: hwhat products show increased sale when cream cheese is discounted hwhich banner ad to use on a web page for a given user coming to the site i Undirected Knowledge Discovery 4 Purpose: Find patterns in the data that may be interesting (no target field) 4 Method: clustering, affinity grouping 4 Examples: hwhich products in the catalog often sell together hmarket segmentation (find groups of customers/users with similar characteristics or behavioral patterns) 7

From Data Mining to Data Science 8

From Data Mining to Data Science 8

Data Mining: On What Kinds of Data? i Database-oriented data sets and applications 4

Data Mining: On What Kinds of Data? i Database-oriented data sets and applications 4 Relational database, data warehouse, transactional database 4 Object-relational databases, Heterogeneous databases and legacy databases i Advanced data sets and advanced applications 4 Data streams and sensor data 4 Time-series data, temporal data, sequence data (incl. bio-sequences) 4 Structure data, graphs, social networks and information networks 4 Spatial data and spatiotemporal data 4 Multimedia database 4 Text data and other semi-structured data 4 The World-Wide Web 9

Data Mining: What Kind of Data? i. Structured Databases 4 relational, object-relational, etc. 4

Data Mining: What Kind of Data? i. Structured Databases 4 relational, object-relational, etc. 4 can use SQL to perform parts of the process e. g. , SELECT count(*) FROM Items WHERE type=video GROUP BY category 10

Data Mining: What Kind of Data? i Flat Files 4 most common data source

Data Mining: What Kind of Data? i Flat Files 4 most common data source 4 can be text (or HTML) or binary 4 may contain transactions, statistical data, measurements, etc. i Transactional databases 4 set of records each with a transaction id, time stamp, and a set of items 4 may have an associated “description” file for the items 4 typical source of data used in market basket analysis 11

Data Mining: What Kind of Data? i Other Types of Databases 4 legacy databases

Data Mining: What Kind of Data? i Other Types of Databases 4 legacy databases 4 multimedia databases (usually very high-dimensional) 4 spatial databases (containing geographical information, such as maps, or satellite imaging data, etc. ) 4 Time Series Temporal Data (time dependent information such as stock market data; usually very dynamic) i World Wide Web 4 basically a large, heterogeneous, distributed database 4 need for new or additional tools and techniques hinformation retrieval, filtering and extraction hagents to assist in browsing and filtering h. Web content, usage, and structure (linkage) mining tools 4 The “social Web” h User generated meta-data, social networks, shared resources, etc. 12

What Can Data Mining Do i Many Data Mining Tasks 4 often inter-related 4

What Can Data Mining Do i Many Data Mining Tasks 4 often inter-related 4 often need to try different techniques/algorithms for each task 4 each tasks may require different types of knowledge discovery i What are some of data mining tasks 4 Classification 4 Prediction 4 Clustering 4 Affinity Grouping / Association discovery 4 Sequence Analysis 4 Characterization 4 Discrimination 13

Some Applications of Data mining i Business data analysis and decision support 4 Marketing

Some Applications of Data mining i Business data analysis and decision support 4 Marketing focalization h. Recognizing specific market segments that respond to particular characteristics h. Return on mailing campaign (target marketing) 4 Customer Profiling h. Segmentation of customer for marketing strategies and/or product offerings h. Customer behavior understanding h. Customer retention and loyalty h. Mass customization / personalization 14

Some Applications of Data mining i Business data analysis and decision support (cont. )

Some Applications of Data mining i Business data analysis and decision support (cont. ) 4 Market analysis and management h. Provide summary information for decision-making h. Market basket analysis, cross selling, market segmentation. h. Resource planning 4 Risk analysis and management h"What if" analysis h. Forecasting h. Pricing analysis, competitive analysis h. Time-series analysis (Ex. stock market) 15

Some Applications of Data mining i Fraud detection 4 Detecting telephone fraud: h Telephone

Some Applications of Data mining i Fraud detection 4 Detecting telephone fraud: h Telephone call model: destination of the call, duration, time of day or week h Analyze patterns that deviate from an expected norm h British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud scheme 4 Detection of credit-card fraud 4 Detecting suspicious money transactions (money laundering) i Text mining: 4 Message filtering (e-mail, newsgroups, etc. ) 4 Newspaper articles analysis 4 Text and document categorization i Web Mining 4 Mining patterns from the content, usage, and structure of Web resources 16