Virtual University of Pakistan Data Warehousing Lecture31 Supervised
- Slides: 16
Virtual University of Pakistan Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www. nu. edu. pk/cairindex. asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan 101@yahoo. com
Data Structures in Data Mining • Data matrix – Table or database – n records and m attributes, – n >> m • Similarity matrix – Symmetric square matrix – n x n or m x m C 1, 1 C 1, 2 C 1, 3 … C 1, m C 2, 1 C 2, 2 C 2, 3 C 2, m C 3, 1 C 3, 2 C 3, 3 C 3, m Cn, 1 Cn, 2 Cn, 3 … Cn, m 1 S 1, 2 S 1, 3 … S 1, n S 2, 1 1 S 2, 3 S 2, n S 3, 1 S 3, 2 1 S 3, n Sn, 1 Sn, 2 Sn, 3 . . … . . . 1
Main types of DATA MINING Supervised • Bayesian Modeling • Decision Trees • Neural Networks • Etc. Type and number of classes are known in advance Unsupervised • One-way Clustering • Two-way Clustering Type and number of classes are NOT known in advance
Clustering: Min-Max Distance Intra-cluster distances are minimized outlier Inter-cluster distances are maximized Salary 20 40 Age 60
How Clustering works?
One-way clustering example Black spots are noise INPUT OUTPUT White spots are missing data
Data Mining Agriculture data clusters INPUT Clustered OUTPUT
Classification Which class? Classifier (model) Unseen Data
How Classification work? Inputs Output Confidence Level
Classification Process (1): Model Construction Relationship between shopping time and items bought Training Data Classification Algorithms (observations, measurements, etc. ) Classifier (Model) IF time/items >= 6 THEN gender = ‘F’
Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Firdous, Time= 15 Items = 1) Gender?
Clustering vs. Cluster Detection
Clustering vs. Cluster Detection Example A B
The K-Means Clustering
The K-Means Clustering: Example A B 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 0 10 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 D 5 6 7 8 9 10 0 1 2 3 4 5 6 C 7 8 9 10
The K-Means Clustering: Comment
- What is data mining and data warehousing
- Hive
- Mining fraud
- Data warehousing data mining and olap
- Oracle data warehouse best practices
- Introduction to data warehousing and data mining
- Supervised vs unsupervised data mining
- Supervised learning adalah
- Supervised vs unsupervised data mining
- Introduction to data warehousing
- Coffing data warehousing
- Data warehouse component
- Data warehouse project plan
- 1keydata data warehousing
- Data warehouse principles
- An overview of data warehousing and olap technology
- Introduction to data warehousing