CS 522 Advanced Database Systems Course Overview Chengyu
CS 522 Advanced Database Systems Course Overview Chengyu Sun California State University, Los Angeles
Why Data Mining? ©Tan, Steinbach, Kumar Introduction to Data Mining 2004
Data Mining Extracting knowledge from large amounts of data ©Tan, Steinbach, Kumar Introduction to Data Mining 2004
Origins of Data Mining Traditional techniques may not be suitable due to n n n Enormity of data High dimensionality of data Heterogeneous, distributed nature of the data Statistics/ Probability AI / Machine Learning Data Mining Database systems
Topics Covered Data warehouse and OLAP Mining frequent patterns Classification and regression Clustering
OLAP Online Analytic Processing n time Vs. OLTP city product sales product 3 2 1 Jan LA 1 100 Feb LA 2 50 time Feb Jan NY 1 30 Mar NY 1 200 Apr 100 30 200 LA NY city
Mining Frequent Patterns Frequent Itemsets: {Coke, Milk} {Beer, Diaper, Milk} ©Tan, Steinbach, Kumar Association Rules: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} Introduction to Data Mining 2004
Classification Test Set Training Set ©Tan, Steinbach, Kumar Learn Classifier Introduction to Data Mining Model 2004
Clustering x. Euclidean Distance Based Clustering in 3 -D space. Intracluster distances are minimized ©Tan, Steinbach, Kumar Intercluster distances are maximized Introduction to Data Mining 2004
Readings Textbook Chapter 1
- Slides: 10