Virtual University of Pakistan Data Warehousing Lecture31 Supervised
















- Slides: 16
Virtual University of Pakistan Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www. nu. edu. pk/cairindex. asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan 101@yahoo. com
Data Structures in Data Mining • Data matrix – Table or database – n records and m attributes, – n >> m • Similarity matrix – Symmetric square matrix – n x n or m x m C 1, 1 C 1, 2 C 1, 3 … C 1, m C 2, 1 C 2, 2 C 2, 3 C 2, m C 3, 1 C 3, 2 C 3, 3 C 3, m Cn, 1 Cn, 2 Cn, 3 … Cn, m 1 S 1, 2 S 1, 3 … S 1, n S 2, 1 1 S 2, 3 S 2, n S 3, 1 S 3, 2 1 S 3, n Sn, 1 Sn, 2 Sn, 3 . . … . . . 1
Main types of DATA MINING Supervised • Bayesian Modeling • Decision Trees • Neural Networks • Etc. Type and number of classes are known in advance Unsupervised • One-way Clustering • Two-way Clustering Type and number of classes are NOT known in advance
Clustering: Min-Max Distance Intra-cluster distances are minimized outlier Inter-cluster distances are maximized Salary 20 40 Age 60
How Clustering works?
One-way clustering example Black spots are noise INPUT OUTPUT White spots are missing data
Data Mining Agriculture data clusters INPUT Clustered OUTPUT
Classification Which class? Classifier (model) Unseen Data
How Classification work? Inputs Output Confidence Level
Classification Process (1): Model Construction Relationship between shopping time and items bought Training Data Classification Algorithms (observations, measurements, etc. ) Classifier (Model) IF time/items >= 6 THEN gender = ‘F’
Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Firdous, Time= 15 Items = 1) Gender?
Clustering vs. Cluster Detection
Clustering vs. Cluster Detection Example A B
The K-Means Clustering
The K-Means Clustering: Example A B 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 0 10 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 D 5 6 7 8 9 10 0 1 2 3 4 5 6 C 7 8 9 10
The K-Means Clustering: Comment