Decision Trees 10 601 Recitation 11708 Mary Mc

  • Slides: 41
Download presentation
Decision Trees 10 -601 Recitation 1/17/08 Mary Mc. Glohon mmcgloho+10601@cs. cmu. edu

Decision Trees 10 -601 Recitation 1/17/08 Mary Mc. Glohon mmcgloho+10601@cs. cmu. edu

Announcements • HW 1 out- DTs and basic probability • Due Mon, Jan 28

Announcements • HW 1 out- DTs and basic probability • Due Mon, Jan 28 at start of class • Matlab • • High-level language, specialized for matrices Built-in plotting software, lots of math libraries On campus lab machines Interest in tutorial? • Smiley Award Plug

Attend. Class? Raining True False Is 10601 True Yes Material Before 10 No Yes

Attend. Class? Raining True False Is 10601 True Yes Material Before 10 No Yes False New True Represent as a logical expression. False Yes Old No

Attend. Class? Raining True False Is 10601 True Yes Material Before 10 No Yes

Attend. Class? Raining True False Is 10601 True Yes Material Before 10 No Yes False New True Represent as a logical expression. False Yes Old No Attend. Class = Yes if: (Raining = False) OR (Is 10601 = True) OR (Material = New AND Before 10 =False)

Split decisions • There are other trees logically equivalent. • How do we know

Split decisions • There are other trees logically equivalent. • How do we know which one to use?

Split decisions • There are other trees logically equivalent. • How do we know

Split decisions • There are other trees logically equivalent. • How do we know which one to use? • Depends on what is important to us.

Information Gain • • • Classically we rely on “information gain”, which uses the

Information Gain • • • Classically we rely on “information gain”, which uses the principle that we want to use the least number of bits, on average, to get our idea across. Suppose I want to send a weather forecast with 4 possible outcomes: Rain, Sun, Snow, and Tornado. 4 outcomes = 2 bits. In Pittsburgh there’s Rain 90% of the time, Snow 5%, Sun 4. 9%, and Tornado. 01%. So if you assign Rain to a 1 -bit message, you rarely send >1 bit.

Entropy

Entropy

Entropy Set S has 6 positive, 2 negative examples. H(S) = -. 75 log

Entropy Set S has 6 positive, 2 negative examples. H(S) = -. 75 log 2(. 75) -. 25 log 2(. 25) = Rain Is 10601 Before 10 Material Attend + + - New + + - + New + + - Old - + - + - +

Conditional Entropy “The average number of bits it would take to encode a message

Conditional Entropy “The average number of bits it would take to encode a message Y, given knowledge of X”

Conditional Entropy H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F) Rain Is 10601

Conditional Entropy H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F) Rain Is 10601 Before 10 Material Attend + + - New + + - + New + + - Old - + - + - +

Conditional Entropy H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)= 1 * 0.

Conditional Entropy H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)= 1 * 0. 5 + 0 * 0. 5 = 0. 5 Rain Is 10601 Before 10 Material Attend + + - New + + - + New + + - Old - + - + - + Entropy of this set =1 Entropy of this set =0

Information Gain IG(S, A) = H(S) - H(S|A) “How much conditioning on attribute A

Information Gain IG(S, A) = H(S) - H(S|A) “How much conditioning on attribute A increases our knowledge (decreases entropy) of S.

Information Gain IG(Attend, Rain) = H(Attend) H(Attend|Rain)=. 8113 -. 5 =. 3113 Rain Is

Information Gain IG(Attend, Rain) = H(Attend) H(Attend|Rain)=. 8113 -. 5 =. 3113 Rain Is 10601 Before 10 Material Attend + + - New + + - + New + + - Old - + - + - +

What about this? For some Material dataset, New Old could we ever build Raining

What about this? For some Material dataset, New Old could we ever build Raining Before 10 this DT? True False Yes Raining True Yes False Yes Is 10601 True Yes False No

What about this? For some Material dataset, could we New Old ever build Raining

What about this? For some Material dataset, could we New Old ever build Raining Before 10 this DT? True False Yes Raining True Yes False Yes Is 10601 True Yes False No True False Yes Is 10601 True Yes False No What if you were taking 20 classes, and it rains 90% of the

What about this? For some Material dataset, could we New Old ever build Raining

What about this? For some Material dataset, could we New Old ever build Raining Before 10 this DT? True False True If most False information is gained from Material or Before 10, we won’t ever Yes need Yes to traverse to 10 -601. So even a bigger tree (node-wise) may be “simpler”, for some data. False Truesets of False Is 10601 Raining True Yes Is 10601 True Yes False No Yes No What if you were taking 20 classes, and it rains 90% of the

Node-based pruning • Until further pruning is harmful, • For each node n in

Node-based pruning • Until further pruning is harmful, • For each node n in trained tree T, • Let Tn’ be T without n (and descendents). Assign removed node to be “best choice” under that traversal. • Record error of Tn’ on validation set. • Let T= Tk’ where Tk’ is pruned tree with best performance on validation

Node-based pruning For each node, record performance on validation set of tree without node.

Node-based pruning For each node, record performance on validation set of tree without node. Suppose our initial tree has 0. 7 accurate performance on validation. Material New True Yes Yes False No True False Is 10601 True Raining Before 10 Raining True Old False Yes Is 10601 True Yes False No

Node-based pruning For each node, record performance on validation set of tree without node.

Node-based pruning For each node, record performance on validation set of tree without node. Suppose our initial tree has 0. 7 accurate performance on validation. Material New True Yes False Yes Is 10601 Yes False No True False Raining True Raining Before 10 Let’s test this node. . . True Old False Yes Is 10601 True Yes False No

Node-based pruning For each node, record performance on validation set of tree without node.

Node-based pruning For each node, record performance on validation set of tree without node. Suppose our initial tree has 0. 7 accurate performance on validation. Material New Old Raining Before 10 True Yes Suppose that most examples where Material=New and Before 10=True are “Yes”. Our new subtree has “Yes” here. False Text Yes True False Yes Is 10601 True Yes False No

Node-based pruning For each node, record performance on validation set of tree without node.

Node-based pruning For each node, record performance on validation set of tree without node. Suppose our initial tree has 0. 7 accurate performance on validation. Material New Old Raining Before 10 True Yes Suppose that most examples where Material=New and Before 10=True are “Yes”. Our new subtree has “Yes” here. False Text Yes True False Yes Is 10601 True Yes Now, test this tree! False No

Node-based pruning For each node, record performance on validation set of tree without node.

Node-based pruning For each node, record performance on validation set of tree without node. Suppose our initial tree has 0. 7 accurate performance on validation. Material New Old Raining Before 10 True Yes Suppose that most examples where Material=New and Before 10=True are “Yes”. Our new subtree has “Yes” here. False Text Yes True False Yes Is 10601 True Yes Now, test this tree! False No

Node-based pruning For each node, record performance on validation set of tree without node.

Node-based pruning For each node, record performance on validation set of tree without node. Suppose our initial tree has 0. 7 accurate performance on validation. Material New Old Raining Before 10 True Yes Suppose that most examples where Material=New and Before 10=True are “Yes”. Our new subtree has “Yes” here. False Text Yes True False Yes Is 10601 True Yes False No Suppose we get accuracy of 0. 73 on this pruned tree. Repeat the test procedure by removing a different node from the original tree. . .

Node-based pruning Material Try this tree (with a different node pruned). . . New

Node-based pruning Material Try this tree (with a different node pruned). . . New Old Raining Before 10 True False Yes Raining True False Yes Is 10601 True Yes False No

Node-based pruning Material Try this tree (with a different node pruned). . . New

Node-based pruning Material Try this tree (with a different node pruned). . . New Old Raining Before 10 True False Yes Raining True False Yes Is 10601 True Yes False No True No False Yes Now, test this tree and record its accuracy.

Node-based pruning Try this tree (with a different node pruned). . . Material Once.

Node-based pruning Try this tree (with a different node pruned). . . Material Once. New we test Old all possible prunings, modify our tree T with Raining Before 10 the pruning that has the best True False performance. Yes No Raining Repeat the entire pruning Now, test this True False selection procedure on new T, tree and record replacing T each time with the its accuracy. Yes Is 10601 best performing pruned tree, until True False we no longer gain anything by pruning. Yes No

Rule-based pruning Material New Old Raining Before 10 True False Yes Raining True False

Rule-based pruning Material New Old Raining Before 10 True False Yes Raining True False Yes Is 10601 True Yes False No True False Yes Is 10601 True Yes 1. Convert tree to rules, one for each leaf: False No IF Material=Old AND Raining = False THEN Attend = Yes IF Material=Old AND Raining=True AND Is 601=True THEN Attend=Yes. . .

Rule-based pruning 2. Prune each rule. For instance, to prune this rule: IF Material=Old

Rule-based pruning 2. Prune each rule. For instance, to prune this rule: IF Material=Old AND Raining = F THEN Attend = T Test potential rule without preconditions on validation set, compare to performance of original rule on set. IF Material=OLD THEN Attend=T IF Raining=F THEN Attend = T

Rule-based pruning Suppose we got the following accuracy for each rule: IF Material=Old AND

Rule-based pruning Suppose we got the following accuracy for each rule: IF Material=Old AND Raining = F THEN Attend = T -0. 6 IF Material=OLD THEN Attend=T -- 0. 5 IF Raining=F THEN Attend = T -- 0. 7

Rule-based pruning Suppose we got the following accuracy for each rule: IF Material=Old AND

Rule-based pruning Suppose we got the following accuracy for each rule: IF Material=Old AND Raining = F THEN Attend = T -0. 6 IF Material=OLD THEN Attend=T -- 0. 5 IF Raining=F THEN Attend = T -- 0. 7 Then, we would keep the best one and drop the others.

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is 601=T then Attend=T If Material=Old AND Raining=T then Attend=T If Material=Old AND Is 601=T then Attend=T If Raining=T and Is 601=T then Attend=T

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is 601=T then Attend=T-- 0. 6 If Material=Old AND Raining=T then Attend=T-- 0. 7 If Material=Old AND Is 601=T then Attend=T-- 0. 3 If Raining=T and Is 601=T then Attend=T-- 0. 65

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is 601=T then Attend=T-- 0. 6 If Material=Old AND Raining=T then Attend=T-- 0. 7 If Material=Old AND Is 601=T then Attend=T-- 0. 3 If Raining=T and Is 601=T then Attend=T-- 0. 65 If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf. If Material=Old AND Raining=T then Attend=T-- 0. 7 If Material=Old then Attend=T-- 0. 3 If Raining = T then Attend = T-- 0. 2

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is 601=T then Attend=T-- 0. 6 If Material=Old AND Raining=T then Attend=T-- 0. 75 If Material=Old AND Is 601=T then Attend=T-- 0. 3 If Raining=T and Is 601=T then Attend=T-- 0. 65 If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf. If Material=Old AND Raining=T then Attend=T-- 0. 75 If Material=Old then Attend=T-- 0. 3 If Raining = T then Attend = T-- 0. 2 Well, maybe not this time!

Rule-based pruning Once we have done the same pruning procedure for each rule in

Rule-based pruning Once we have done the same pruning procedure for each rule in the tree. . 3. Order the ‘kept rules’ by their accuracy, and do all subsequent classification with that priority. -IF Material=Old AND Raining=T THEN Attend=T-0. 75 -IF Raining=F THEN Attend = T -- 0. 7 -. . (and so on for other pruned rules). . . (Note that you may wind up with a differentlystructured DT than before, as discussed in class)

Adding randomness Raining True Is 10601 True Yes What if you didn’t know if

Adding randomness Raining True Is 10601 True Yes What if you didn’t know if you False had new material? Yes For instance, you wanted to classify this: False Material New Before 10 True No False Yes Rain Is 601 T F Old No Before 1 Material Attend? 0 ? ? ? F

Adding randomness Raining True Is 10601 True Yes What if you didn’t know if

Adding randomness Raining True Is 10601 True Yes What if you didn’t know if you False had new material? Yes For instance, you wanted to classify this: False Material New Before 10 True No False Yes Old Rain Is 601 T F where to go? No Before 1 Material Attend? 0 ? ? ? F You could look at training set, and see that when Rain=T an 10601=F, p fraction of the examples had new material. Then flip a p-biased coin and descend the appropriate branch. But that might not be the best idea. Why not?

Adding randomness Also, you may have missing data in the training set. Before 1

Adding randomness Also, you may have missing data in the training set. Before 1 Rain Is 601 Material Attend? Raining 0 True False Yes Is 10601 True Yes False ? T F ? ? ? F T There also methods to deal with this using probability. “Well, 60% of the time when Rain and not 601, there’s new material (when we know there is new material). So we’ll just randomly select 60% of rainy, non-601 examples where we don’t know the material, to be old material.

 • Adventures in Probability That approach tends to work well. Still, we may

• Adventures in Probability That approach tends to work well. Still, we may have the following trouble. • • What if there aren’t very many training examples where Rain = True and 10601=False? Wouldn’t we still want to use examples where Rain=False to get the missing value? Well, it “depends”. Stay tuned for lecture next week!