Induction of Decision Trees Bla Zupan Ivan Bratko
Induction of Decision Trees Blaž Zupan, Ivan Bratko magix. fri. uni-lj. si/predavanja/uisp
An Example Data Set and Decision Tree outlook sunny rainy yes company no med no sailboat small yes big no big yes
Classification outlook sunny rainy yes company no med no sailboat small yes big no big yes
Induction of Decision Trees • Data Set (Learning Set) – Each example = Attributes + Class • Induced description = Decision tree • TDIDT – Top Down Induction of Decision Trees • Recursive Partitioning
Some TDIDT Systems • • ID 3 (Quinlan 79) CART (Brieman et al. 84) Assistant (Cestnik et al. 87) C 4. 5 (Quinlan 93) See 5 (Quinlan 97). . . Orange (Demšar, Zupan 98 -03)
Analysis of Severe Trauma Patients Data PH_ICU <7. 2 Death 0. 0 (0/15) 7. 2 -7. 33 APPT_WORST <78. 7 Well 0. 82 (9/11) >=78. 7 Death 0. 0 (0/7) The worst p. H value at ICU >7. 33 Well 0. 88 (14/16) The worst active partial thromboplastin time PH_ICU and APPT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al. , 1997.
Breast Cancer Recurrence Degree of Malig <3 Tumor Size < 15 Involved Nodes >= 15 no rec 125 recurr 39 Age no rec 4 recurr 1 >= 3 <3 no rec 30 recurr 18 >= 3 recurr 27 no_rec 10 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists
Prostate cancer recurrence Secondary Gleason Grade 1, 2 No 4 PSA Level 14. 9 No 3 14. 9 Primary Gleason Grade 2, 3 4 No Yes 5 Yes Stage T 1 ab, T 3 T 1 c, T 2 a, T 2 b, T 2 c No Yes
TDIDT Algorithm • Also known as ID 3 (Quinlan) • To construct decision tree T from learning set S: – If all examples in S belong to some class C Then make leaf labeled C – Otherwise • select the “most informative” attribute A • partition S according to A’s values • recursively construct subtrees T 1, T 2, . . . , for the subsets of S
TDIDT Algorithm • Resulting tree T is: Attribute A A v 1 T 1 v 2 T 2 vn A’s values Tn Subtrees
Another Example
Simple Tree Outlook sunny overcast Humidity high N rainy P Windy normal P yes N no P
Complicated Tree Temperature cold moderate Outlook sunny P overcast P hot Outlook rainy sunny Windy yes N no P Windy yes P Windy rainy overcast P Humidity no N yes high N Humidity normal Windy yes N P no sunny P no N high normal Outlook P overcast P rainy null
Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity – – – Information-theoretic Gini index X 2 Relief. F. . . • Various improvements – probability estimates – normalization – binarization, subsetting
Information-Theoretic Approach • To classify an object, a certain information is needed – I, information • After we have learned the value of A, we only need some remaining amount of information to classify the object – Ires, residual information • Gain – Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i. e. , maximizes Gain
Entropy • The average amount of information I needed to classify an object is given by the entropy measure • For a two-class problem: entropy p(c 1)
Residual Information • After applying attribute A, S is partitioned into subsets according to values v of A • Ires is equal to weighted sum of the amounts of information for the subsets
Triangles and Squares
Triangles and Squares Data Set: A set of classified objects. . .
Entropy. . . • 5 triangles • 9 squares • class probabilities. • entropy
Entropy reduction by data set partitioning . . . . red Color? green . yellow . . .
Entropija vrednosti atributa . . . . red Color? green yellow . .
Information Gain . . . . red Color? green yellow . .
Information Gain of The Attribute • Attributes – Gain(Color) = 0. 246 – Gain(Outline) = 0. 151 – Gain(Dot) = 0. 048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity)
. . . . red Color? green . . yellow . . Gain(Outline) = 0. 971 – 0 = 0. 971 bits Gain(Dot) = 0. 971 – 0. 951 = 0. 020 bits
. . . . red Gain(Outline) = 0. 971 – 0. 951 = 0. 020 bits Gain(Dot) = 0. 971 – 0 = 0. 971 bits Color? green . . yellow . . solid Outline? dashed . .
. . . . red . yes Dot? Color? . no green . . yellow . . solid Outline? dashed . .
Decision Tree. . . Color red Dot yes triangle yellow green square no square Outline dashed triangle solid square
A Defect of Ires • Ires favors attributes with many values • Such attribute splits S to many subsets, and if these are small, they will tend to be pure anyway • One way to rectify this is through a corrected measure of information gain ratio.
Information Gain Ratio • I(A) is amount of information needed to determine the value of an attribute A • Information gain ratio
Information Gain Ratio . . . . red Color? green yellow . .
Information Gain and Information Gain Ratio
Gini Index • Another sensible measure of impurity (i and j are classes) • After applying attribute A, the resulting Gini index is • Gini can be interpreted as expected error rate
Gini Index. . .
Gini Index for Color . . . . red Color? green yellow . .
Gain of Gini Index
Three Impurity Measures • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly
Orange: Shapes Data Set shape. tab
Orange: Impurity Measures import orange data = orange. Example. Table('shape') gain = orange. Measure. Attribute_info gain. Ratio = orange. Measure. Attribute_gain. Ratio gini = orange. Measure. Attribute_gini print "%15 s %-8 s" % ("name", "gain", " g ratio", "gini") for attr in data. domain. attributes: print "%15 s %4. 3 f" % (attr. name, gain(attr, data), gain. Ratio(attr, data), gini(attr, data)) name Color Outline Dot gain 0. 247 0. 152 0. 048 g ratio 0. 156 0. 152 0. 049 gini 0. 058 0. 046 0. 015
Orange: orng. Tree import orange, orng. Tree data = orange. Example. Table('shape') tree = orng. Tree. Learner(data) orng. Tree. print. Txt(tree) print 'n. With contingency vector: ' orng. Tree. print. Txt(tree, internal. Node. Fields=['contingency'], leaf. Fields=['contingency']) Color green: | Outline dashed: triange (100. 0%) | Outline solid: square (100. 0%) Color yellow: square (100. 0%) Color red: | Dot no: square (100. 0%) | Dot yes: triange (100. 0%) With contingency vector: Color (<5, 9>) green: | Outline (<3, 2>) dashed: triange (<3, 0>) | Outline (<3, 2>) solid: square (<0, 2>) Color (<5, 9>) yellow: square (<0, 4>) Color (<5, 9>) red: | Dot (<2, 3>) no: square (<0, 3>) | Dot (<2, 3>) yes: triange (<2, 0>)
Orange: Saving to DOT import orange, orng. Tree data = orange. Example. Table('shape') tree = orng. Tree. Learner(data) orng. Tree. print. Dot(tree, 'shape. dot', leaf. Shape='box', internal. Node. Shape='ellipse') > dot -Tgif shape. dot > shape. gif
DB Miner: visualization
SGI Mine. Set: visualization
- Slides: 43