Data Mining Solutions Westphal Blaxton 1998 Dr K
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC Sep. 30, 1999 Dr. K. Palaniappan
What is Data Mining? l l l “Something old, something new” Data mining vs applied statistics Data mining vs pattern recognition Data mining vs machine intelligence Unique characteristics Large information databases l Exploratory analysis vs predefined hypothesis l Sep. 30, 1999 Dr. K. Palaniappan 2
What is Data Mining? l Unique characteristics (cont’d) Qualitative vs quantitative tools l Visualization vs numeric tests l Data massaging - cleaning, warehousing l Discover “interesting” patterns and trends that have business relevance (revenue & profits) l Sep. 30, 1999 Dr. K. Palaniappan 3
Data Mining Activities l Marketing - predict response to direct mail or l Production - relate product quality to l Financial - identify anomalous patterns in telephone solicitation using historical data l Advertising - set ad rates based on number of Internet viewers, patterns of ad viewing manufacturing and process variables transactions related to fraud, criminal activity Sep. 30, 1999 Dr. K. Palaniappan 4
Data Mining Activities l Insurance - compare property claims vs estimated damage from natural disasters ie. Hurricane Floyd, earthquakes E-Commerce - product clusters, z. Shops l Information - WWW, electronic journals l l Medical - identifying disease (I. e. cancer) with a causitive agent l Analysis vs monitoring Offline vs online l “good” vs “bad” transaction or condition l Sep. 30, 1999 Dr. K. Palaniappan 5
Data Mining Tasks Classification or identification - automatically label input records l Estimation or regression - predict magnitude of response or other missing field given input records l Segmentation or clustering - group the input records into meaningful sub-populations l Sep. 30, 1999 Dr. K. Palaniappan 6
Data Mining Tasks l Description or visualization - looking for gems and diamonds among pebbles l Exploit the power of human (visual) perception for detecting interesting patterns in data vs scrolling through textual tables Sep. 30, 1999 Dr. K. Palaniappan 7
Predictive and Descriptive Goals Predictive - produce models for classification or estimation l Descriptive - uncovering patterns and relationships l Sep. 30, 1999 Dr. K. Palaniappan 8
Structured and Unstructured Data Structured - fixed length, fixed format records with numeric values, character codes, strings, etc. l “Unstructured” - images (i. e. aerial or satellite photos of damage for insurance claims), video (i. e. shopping pattern behavior) l Sep. 30, 1999 Dr. K. Palaniappan 9
Data Modeling Object modeling l Object attributes - value for each attribute as extracted from data record l Attribute assignment - e. g. notebook, cabinet, can, cup, case l l l Size, shape, material , purpose State-based analysis Sep. 30, 1999 Dr. K. Palaniappan 10
Data Modeling l Flexibility in attribute modeling e. g. franchise (object class), city (attribute) OR city (object class), franchise (attribute) l Analysis based on stores vs analysis based on location l l Class relationship model - links describe relationships modeled as objects with attributes Sep. 30, 1999 Dr. K. Palaniappan 11
Data Modeling l Composite representations Combining objects with similar user-selected characteristics l Data abstraction l l Metadata - data within data, data about data Metadata from dates, numbers, address l Seasonality, warranty related parts failure l Sep. 30, 1999 Dr. K. Palaniappan 12
Data Modeling l Descriptive vs transactional model l l Telephone calling patterns Intra- and inter-domain patterns Horizontal vs vertical l Communication, transportation, inventory l l Combining data sources l Spatial, temporal, structure-based (categorical clusters), value-based (discrete ranges) Sep. 30, 1999 Dr. K. Palaniappan 13
Data Modeling e. g. Tax compliance - tax return, real-property assets, motor vehicle records, bank transfers l e. g. Medicare filings, pharmacy product pricing l Sep. 30, 1999 Dr. K. Palaniappan 14
Problem Definition l Knowledge representation using hierarchical frameworks l l Objects--> Relationships-->Networks--> Applications-->Systems Procedural vs declarative knowledge Episodic data tagged with temporal and spatial information l Semantic data more commonly analyzed l Sep. 30, 1999 Dr. K. Palaniappan 15
Data Preparation & Analysis Define data mining goals l Accessing and preparing data l l Capitalization, concatenation, representation format, augmentation, abstraction, unit conversion, exclusion Sep. 30, 1999 Dr. K. Palaniappan 16
- Slides: 16