Classification Mining Muhammad Ali Yousuf ITM Based on
Classification Mining Muhammad Ali Yousuf ITM (Based on: 1. Notes by David Squire, Monash University 2. Notes by Mohammad Zaki, RPI) 1
Contents n n n n Classification What are Bayesian Classifiers? Bayes Theorem Naïve Bayesian Classification Bayesian Belief Networks Training Bayesian Belief Networks Why use Bayesian Classifiers? Example Software: Netica 2
Classification n n The Model can use Neural Nets, Genetic Algorithms, Bayesian, or Decision Tree techniques. Classification is learning from examples, inductive learning, supervised learning (labels assigned beforehand by user) 3
Example Training Set Age 23 17 43 68 32 20 Numeric domain Car. Type Family Sports Family Truck Family Categorical domain Risk High Low High Class 4
Construct a model of class in terms of Age & Car. Type. Save a random 20% of data for testing Database 80% of Database MODEL Decision Tree Evaluation 20% of Database 5
Accuracy = #Correct / #total Error = #Incorrect / #total Real Value Predicted Value Data H H Data L H Data H L Accuracy = 3/5 = 60% , Error = 2/5 = 40% Use classification to make discrete classes Use regression to make real-valued classes 6
Given following database D Numeric Categorical Age Car. Type Risk 23 F H 17 S H 43 S H 68 F L 32 T L 20 F H Class 7
Decision Tree Age < 25 Y N Car. Type {sports} H Y H N L 1) All internal nodes are decisions A < x (numeric) A Subset (categorical) 2) All leaf nodes are class label 8
Decision Tree n n Decision or Classification Rules If (Age < 25) then Class = H If (Age 25) and Car. Type {Sports} then class = L 9
Hunts Method n n n Hunts (D) If (all examples in D have the same class) then return; For each attribute A evaluate all split points / Decisions for A; Choose the best attribute & its split point Partition D into D 1, D 2, …, Dn (Binary partitions) Call Hunts(Di) i 10
Example: Car Type 11
12
n n n n n |Car. Type| = 3 3 binary decisions |Car. Type| = 4 7 possible splits A | BCD B | ACD C | ABD D | ABC AB | CD AC | BD AD | BC General formula 2 n-1 -1 13
n Given D, n n n P H= 4 / 6 = 2 / 3 P L= 2 / 6 = 1 / 3 14
15
16
17
18
19
20
21
22
23
24
25
Disadvantages of the Method n n We have to sort each Numeric attribute separately We have to sort at each level Hence it is not feasible. We consider another algoritm called SPRINT. Here we generate "Attribute Class List" for each attribute. We assign a RID to each record. Attribute class list is sorted upon the attribute. we don’t change the value of RID of that particular record. 26
27
28
29
30
31
32
Example Software: TMiner n n Download from: http: //frontdb. ugr. es/ 33
- Slides: 33