Decision Trees Dr Anil Maheshwari Introduction Decision trees

  • Slides: 17
Download presentation
Decision Trees Dr. Anil Maheshwari

Decision Trees Dr. Anil Maheshwari

Introduction Decision trees are a simple hierarchically structured way to guide one’s path to

Introduction Decision trees are a simple hierarchically structured way to guide one’s path to a decision. Decision tree learning is one of the most widely used techniques for classification. Its classification accuracy is competitive with other methods, and it is very efficient. The classification model is a tree, called decision tree.

Decision Trees – generic pseudocode Employs the divide and conquer method Recursively divides a

Decision Trees – generic pseudocode Employs the divide and conquer method Recursively divides a training set until each division consists of examples from one class 1. Create a root node and assign all of the training data to it 2. Select the best splitting attribute 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive subsets along the lines of the specific split 4. Repeat the steps 2 and 3 for each and every leaf node until the stopping criteria is reached

Decision Trees Decision Tree algorithms mainly differ on Splitting criteria Which variable to split

Decision Trees Decision Tree algorithms mainly differ on Splitting criteria Which variable to split first? – Information Gain What values to use to split? How many splits to form for each node? Stopping criteria When to stop building the tree – Max tolerable error Pruning (generalization method) Pre-pruning versus post-pruning Most popular DT algorithms include ID 3, C 4. 5, C 5; CART; CHAID; M 5

Exercise: Decision tree to Predict ‘Play’ Outlook Temp Humidity Windy Play Sunny Hot High

Exercise: Decision tree to Predict ‘Play’ Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Outlook Temp Humidity Windy Play Rainy Cool Normal True No Sunny Normal True ? ? Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Question: How to Select the best splitting attribute Hot

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Error Sunny Hot

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Error Sunny Hot High False No Outlook Sunny→No 2/5 Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Total Error

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Sunny Hot High

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Sunny Hot High False No Outlook Sunny→No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Overcast →yes Erro Total r Error 2/5 0/4

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Sunny Hot High

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Sunny Hot High False No Outlook Sunny→No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Overcast →yes Rainy →yes Erro Total r Error 2/5 4/14 0/4 2/5

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Error Outlook Sunny→No

Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Error Outlook Sunny→No Overcast 2/5 Sunny Hot High False No Sunny Hot High True No →yes Overcast Hot High False Yes Rainy →yes 2/5 Rainy Mild High False Yes Hot →No 2/4 Rainy Cool Normal False Yes Mild →Yes 2/6 Rainy Cool Normal True No Cool → Yes 1/4 Overcast Cool Normal True Yes High → No 3/7 Normal →Yes 1/7 Sunny Mild High False No False →Yes 2/8 Sunny Cool Normal False Yes True →No 3/6 Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Temp Humidity Windy Total Error 4/14 0/4 5/14 4/14 5/14

Decision tree after Iteration 1 (for weather/play problem) Outlook Sunny Overcast YES Rainy

Decision tree after Iteration 1 (for weather/play problem) Outlook Sunny Overcast YES Rainy

Decision tree after Iteration 1 (for weather/play problem) Outlook Sunny Overcast Rainy YES Attribute

Decision tree after Iteration 1 (for weather/play problem) Outlook Sunny Overcast Rainy YES Attribute Temp Humidity Windy Rules Hot->No Mild ->No Cool -> yes High->No Normal->Yes False->No Error 0/2 1/2 0/1 0/3 0/2 1/3 Total Error 1/5 0/5 2/5 Attribute Temp Humidity Windy Rules Mild->Yes Cool->yes High->No Normal->Yes False->Yes True-No Error 1/3 1/2 1/3 0/2 Total Error 2/5 1/5 0/5

Decision tree (for weather/play problem) Outloo Temp k Sunny Hot Humidi Wind Play ty

Decision tree (for weather/play problem) Outloo Temp k Sunny Hot Humidi Wind Play ty y Normal True YES Predict using the model

Decision tree (for weather/play problem) Notes: ●Not all leaves need to be pure; sometimes

Decision tree (for weather/play problem) Notes: ●Not all leaves need to be pure; sometimes identical instances have different class. ●Splitting stops when data can’t be split any further ●

Decision Tree vs Table Lookup Decision Tree Table Lookup Accuracy Varied level of accuracy

Decision Tree vs Table Lookup Decision Tree Table Lookup Accuracy Varied level of accuracy 100% accurate Generality General. Applies to all situations Applies only when a similar case occurred before Frugality Only three variables needed All four variables are needed Simple Only one or two questions asked All four variable values are needed Easy Logical, and easy to understand Can be cumbersome to look up; no understanding of the logic behind the decision

Decision Tree Algorithms Decision-Tree Full Name C 4. 5 CART CHAID Iterative Dichotomiser (ID

Decision Tree Algorithms Decision-Tree Full Name C 4. 5 CART CHAID Iterative Dichotomiser (ID 3) Classification and Regression Chi-square Automatic Trees Interaction Detector Basic algorithm Hunt’s algorithm adjusted significance testing Developer When developed Ross Quinlan 1986 Bremman 1984 Gordon Kass 1980 Types of trees Classification Type of data Types of splits Discrete & Continuous; Incomplete data Multi-way splits Splitting criteria Information gain Pruning Criteria Clever bottom-up technique avoids overfitting Implementation Publicly available Classification & Regression Classification & regression trees Serial implementation Tree-growth & Tree-pruning Discrete and Continuous Binary splits only; Clever surrogate splits to reduce tree depth Gini’s coefficient, and others Non-normal data also accepted Multi-way splits as default Chi-square test Trees can become very large Publicly available in most packages Popular in market research, for segmentation

Data Mining with Weka 3. 4 – Making decision trees https: //www. youtube. com/watch?

Data Mining with Weka 3. 4 – Making decision trees https: //www. youtube. com/watch? v=l 7 R 9 NHqv. I 0 Y Weka 3. 5 - Pruning decision trees https: //www. youtube. com/watch? v=nc. R_6 Usugg. Y Coursera Videos 7 -7 to 7 -11 PLANET Algorithm for DT with Map. Reduce

Thank you.

Thank you.