Some Classification Algorithms in Machine Learning Presented by
Some Classification Algorithms in Machine Learning Presented by Asst. Prof. Mahajan Uchita Vidyadhar Department of Statistics, KCE’s Society PGCSTR, Jalgaon. 1
Content Ø Ø Ø Ø Ø Introduction Machine learning methodologies What is classification? What is machine learning? Types of classification algorithm in machine learning Decision tree Naïve bayes Applications Reference Acknowledgement 2
Introduction 3
What is Classification? § Classification is the process of predicting the class of given data points. Classes are sometimes called as target / labels or categories. § Classification belongs to the category of supervised learning where the target also provided with the input data. § Here, we are going to deal with two important classification algorithms § Decision Tree § Naïve Bayes What is Machine learning? § Machine learning explores the study and construction of algorithms that can learn from and make prediction on data. § Closely related to computational Statistics. 4
Machine Learning Methodologies § Supervised learning Learning from labelled data. § Classification, regression, forecasting, prediction. § § Unsupervised learning Learning from unlabeled data. § Clustering, dimension reduction. § Reinforcement learning § model learns from a series of actions by maximizing a reward function. § Example: -training of self-driving car using feedback from the 5 environment. §
Types of Classification Algorithm in Machine Learning 1. Decision Trees 2. Naive Bayes Classifier 3. Support Vector Machines 4. Random Forest 5. Artificial Neural Network 6. Boosted Trees 7. Nearest Neighbour 6
Decision Tree 7 § It is a type of supervised algorithm. § We split the population into two or more homogeneous sets. § Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node. § Decision tree is a classifier in the form of a tree structure § § Decision node: -Specifies a test on a single attribute. § Leaf node: -Indicates the value of the target attribute. § Branch: -Split of one attribute. § Path: -A disjunction of test to make the final decision. Decision tree can handle both categorical and numerical data.
Example Source: saedsayad. com Outlook Rainy Overcast Temp Hot Hot Humidity High Windy False True False Play Golf No No Yes Sunny Mild Cool High Normal False True Yes No Overcast Rainy Sunny Rainy Overcast Cool Mild Hot Normal High Normal True False True False Yes No Yes Yes Yes Sunny Mild High True 8 No
Entropy 9 § A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogeneous). § ID 3 algorithm uses entropy to calculate the homogeneity of a sample. § If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.
To build a decision tree, we need to calculate two types of entropy using frequency tables as follows Play Golf Yes No 9 5 10
b) Entropy using the frequency table of two attributes 11 Play Golf Yes No Sunny 3 2 0 Outlook Overcast 4 Rainy 2 Entropy(Play. Golf, Outlook) =P(Sunny)*E(3, 2)+P(Overcast)*E(4, 0)+P(Rainy)*E(2, 3) =(5/14)*0. 971+(4/14)*0. 0+(5/14)*0. 971 =0. 693 3 5 4 5 14
Information Gain 12
Step 2 13 § The dataset is then split on the different attributes. § The entropy for each branch is calculated. § Then it is added proportionally, to get total entropy for the split. § The resulting entropy is subtracted from the entropy before the split. § The result is the Information Gain, or decrease in entropy. § Gain(T, X)=E(T)-E(T, X) Gain(Play. Golf, Outlook)=E(Play. Golf)-E(play. Golf , Outlook) =0. 940 -0. 693 =0. 247
Continue… 14 Play Golf Yes No Sunny 3 2 Outlook Overcast 4 0 Rainy 2 3 Gain=0. 247 Play Golf Yes No Hot 2 2 Temp. Mild 4 2 Cool 3 1 Gain=0. 029 Play Golf Yes No High 3 4 Humidity Normal 6 1 Gain=0. 152 Play Golf Yes No False 6 2 Windy True 3 3 Gain=0. 048
Step 3 § Choose attribute with the largest information gain as the decision node. Play Golf Yes No Sunny 3 2 Outlook Overcast 4 0 Rainy 2 3 Gain=0. 247 15
Step 4(a) § A branch with entropy of 0 is a leaf node. Temp Humidity Windy Play Golf Hot High False Yes Cool Normal True Yes Mild High True Yes Hot Normal False Yes Sunny 16 Outlook Overcast Play=Yes Rainy
Step 4(b) § A branch with entropy more than 0 needs further splitting. Temp Humidity Windy Play Golf Mild High False Yes Cool Normal False Yes Mild Normal False Yes Cool Normal True No Mild High True No 17 Outlook Sunny Overcast Windy Play=Yes False Play=Yes True Play=No Rainy
Step 5 § The ID 3 algorithm is run recursively on the nonleaf branches, until all data is classified. So, our tree looks like this… 18
Decision Tree Outlook Sunny Yes Windy False Yes Rainy Overcast True No Humidity Normal High No 19 Yes
Naive Bayes 20 § It is a type of supervised algorithm. § It is a very simple algorithm to implement and good results have obtained in cases. § The Naïve Bayes classifier is based on Baye’s theorem and classifies every value as independence between every pair of features. § In Naïve Bayes we would guess the best class and assign a probability for that best guess. § Naïve Bayes classifiers work well in many real-world situations such as document classification and spam filtering.
Background 21
Continue…. where, c is target and x , y are attributes 22
Continue… 23
Conditional Probability Model Prior Probability Likelihood Posterior Probability Normalizing constant 24
Example § Given all the previous patient’s symptoms and diagnosis. Chills Yes Yes No No Yes Runny nose headache No Mild Yes No No Strong Yes No Yes Yes Mild No Strong Mild fever Yes No Yes Flu? No Yes Yes No Yes § Does the patient with the following symptoms have the flu? Chills Runny nose headache fever Flu? Yes No Mild Yes ? 25
How Naïve Bayes Works? First, we compute all possible individual probabilities conditioned on the target attribute (Flu). P(Flu=Yes) P(Chills=Yes | Flu=Yes) P(Chills=No | Flu=Yes) P(Runny nose=Yes | Flu=Yes) P(Runny nose=No | Flu=Yes) P(Headache=Mild | Flu=Yes) P(Headache=No | Flu=Yes) P(Headache=Strong | Flu=Yes) P(Fever=Yes | Flu=Yes) P(Fever=No | Flu=Yes) 0. 625 0. 6 0. 4 0. 8 0. 2 P(Flu=No) P(Chills=Yes | Flu=No) P(Chills=No | Flu=No) P(Runny nose=Yes | Flu=No) P(Runny nose=No | Flu=No) P(Headache=Mild | Flu=No) P(Headache=No | Flu=No) P(Headache=Strong | Flu=No) P(Fever=Yes | Flu=No) P(Fever=No | Flu=No) 26 0. 375 0. 333 0. 666
And then decide… 27 Some Classification Algorithms in Machine Learning P(Flu=Yes | Given attribute) =P(Chills=Y | Flu=Y)*P(Runny nose=N | Flu=Y)*P(Headache=Mild | Flu=Y) *P(Fever=N | Flu=Y)*P(Flu=Y) =0. 6*0. 2*0. 4*0. 2*0. 625 =0. 006 Vs P(Flu=No | Given attribute) =P(Chills=Y | Flu=N)*P(Runny nose=N | Flu=N)*P(Headache=Mild | Flu=N) *P(Fever=N | Flu=N)*P(Flu=N) =0. 333*0. 666*0. 375 =0. 0184 So, Naïve Bayes classifier predicts that the patient doesn’t have the flu.
Applications § § § § Speech recognition Effective web search Fraud detection Medical diagnosis Stock market analysis Spam filtering Computational finance Structural health monitoring 28
Reference 29 § www. machinelearningplus. com § www. rischanlab. github. io § www. analyticsvidhya. com § www. researchgate. net § Sankara Subbu, Ramesh, "Brief Study of Classification Algorithms in Machine Learning" (2017). CUNY Academic Works. http: //academicworks. cuny. edu/cc_etds_theses/679
30 THANK YOU
- Slides: 30