Decision Trees with Numeric Tests Industrialstrength algorithms For

  • Slides: 18
Download presentation
Decision Trees with Numeric Tests

Decision Trees with Numeric Tests

Industrial-strength algorithms § § For an algorithm to be useful in a wide range

Industrial-strength algorithms § § For an algorithm to be useful in a wide range of realworld applications it must: § Permit numeric attributes § Allow missing values § Be robust in the presence of noise Basic schemes need to be extended to fulfill these requirements

C 4. 5 History § ID 3, CHAID – 1960 s § C 4.

C 4. 5 History § ID 3, CHAID – 1960 s § C 4. 5 innovations (Quinlan): § permit numeric attributes § deal sensibly with missing values § pruning to deal with for noisy data § C 4. 5 - one of best-known and most widely-used learning algorithms § Last research version: C 4. 8, implemented in Weka as J 4. 8 (Java) § Commercial successor: C 5. 0 (available from Rulequest)

Numeric attributes § Standard method: binary splits § E. g. temp < 45 §

Numeric attributes § Standard method: binary splits § E. g. temp < 45 § Unlike nominal attributes, every attribute has many possible split points § Solution is straightforward extension: § § Evaluate info gain (or other measure) for every possible split point of attribute § Choose “best” split point § Info gain for best split point is info gain for attribute Computationally more demanding

Example § Split on temperature attribute: 64 65 68 69 Yes No Yes 70

Example § Split on temperature attribute: 64 65 68 69 Yes No Yes 70 71 72 72 75 Yes No No Yes 75 80 81 83 Yes No temperature 71. 5: yes/4, no/2 temperature 71. 5: yes/5, no/3 § E. g. § Info([4, 2], [5, 3]) = 6/14 info([4, 2]) + 8/14 info([5, 3]) = 0. 939 bits § Place split points halfway between values § Can evaluate all split points in one pass! 85

Example Split on temperature attribute: info. Gain § 0 64 65 68 69 Yes

Example Split on temperature attribute: info. Gain § 0 64 65 68 69 Yes No Yes 70 71 72 72 75 Yes No No Yes 75 80 81 83 85 Yes No

Speeding up § Entropy only needs to be evaluated between points of different classes

Speeding up § Entropy only needs to be evaluated between points of different classes (Fayyad & Irani, 1992) value 64 class Yes 65 68 69 No Yes 70 71 72 72 75 Yes No No Yes 75 80 81 83 85 Yes No X Potential optimal breakpoints Breakpoints between values of the same class cannot be optimal

Missing as a separate value § Missing value denoted “? ” in C 4.

Missing as a separate value § Missing value denoted “? ” in C 4. 5 § Simple idea: treat missing as a separate value § Q: When is this not appropriate? § A: When values are missing due to different reasons § Example 1: gene expression could be missing when it is very high or very low § Example 2: field Is. Pregnant=missing for a male patient should be treated differently (no) than for a female patient of age 25 (unknown)

Missing values - advanced Split instances with missing values into pieces § § A

Missing values - advanced Split instances with missing values into pieces § § A piece going down a branch receives a weight proportional to the popularity of the branch § weights sum to 1 Info gain works with fractional instances § § use sums of weights instead of counts During classification, split the instance into pieces in the same way § Merge probability distribution using weights

Application: Computer Vision 1

Application: Computer Vision 1

Application: Computer Vision 2 feature extraction § § § color (RGB, hue, saturation) edge,

Application: Computer Vision 2 feature extraction § § § color (RGB, hue, saturation) edge, orientation texture XY coordinates 3 D information

Application: Computer Vision 3 how grey? below horizon?

Application: Computer Vision 3 how grey? below horizon?

Application: Computer Vision 4 prediction

Application: Computer Vision 4 prediction

Application: Computer Vision 4 inverse perspective

Application: Computer Vision 4 inverse perspective

Application: Computer Vision 5 inverse perspective path planning

Application: Computer Vision 5 inverse perspective path planning

Quiz 1 Q: If an attribute A has high info gain, does it always

Quiz 1 Q: If an attribute A has high info gain, does it always appear in a decision tree? A: No. If it is highly correlated with another attribute B, and info. Gain(B) > info. Gain(A), then B will appear in the tree, and further splitting on A will not be useful. B + - + + + A +

Quiz 2 Q: Can an attribute appear more than once in a decision tree?

Quiz 2 Q: Can an attribute appear more than once in a decision tree? A: Yes. If a test is not at the root of the tree, it can appear in different branches. Q: And on a single path in the tree (from root to leaf)? A: Yes. Numeric attributes can appear more than once, but only with very different numeric conditions. + + + -- + A

Quiz 3 Q: If an attribute A has info. Gain(A)=0, can it ever appear

Quiz 3 Q: If an attribute A has info. Gain(A)=0, can it ever appear in a decision tree? A: Yes. 1. All attributes may have zero info gain. 2. info gain often changes when splitting on another attribute. B - the XOR problem: + - - + + + - - + - A