Induction of Decision Trees An Example Data Set

  • Slides: 26
Download presentation
Induction of Decision Trees

Induction of Decision Trees

An Example Data Set and Decision Tree outlook sunny rainy yes company no med

An Example Data Set and Decision Tree outlook sunny rainy yes company no med no sailboat small yes big no big yes

Classification outlook sunny rainy yes company no med no sailboat small yes big no

Classification outlook sunny rainy yes company no med no sailboat small yes big no big yes

Breast Cancer Recurrence Degree of Malig <3 Tumor Size < 15 Involved Nodes >=

Breast Cancer Recurrence Degree of Malig <3 Tumor Size < 15 Involved Nodes >= 15 no rec 125 recurr 39 Age no rec 4 recurr 1 >= 3 <3 no rec 30 recurr 18 >= 3 recurr 27 no_rec 10 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists

Another Example

Another Example

Simple Tree Outlook sunny overcast Humidity high N rainy P Windy normal P yes

Simple Tree Outlook sunny overcast Humidity high N rainy P Windy normal P yes N no P

Complicated Tree Temperature cold moderate Outlook sunny P overcast P hot Outlook rainy sunny

Complicated Tree Temperature cold moderate Outlook sunny P overcast P hot Outlook rainy sunny Windy yes N no P Windy yes P Windy rainy overcast P Humidity no N yes high N Humidity normal Windy yes N P no sunny P no N high normal Outlook P overcast P rainy null

Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set

Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity – – – Information-theoretic Gini index X 2 Relief. F. . .

Information-Theoretic Approach • To classify an object, a certain information is needed – I,

Information-Theoretic Approach • To classify an object, a certain information is needed – I, information • After we have learned the value of attribute A, we only need some remaining amount of information to classify the object – Ires, residual information • Gain – Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i. e. , maximizes Gain

Entropy • The average amount of information I needed to classify an object is

Entropy • The average amount of information I needed to classify an object is given by the entropy measure • For a two-class problem: entropy p(c 1)

Triangles and Squares

Triangles and Squares

Triangles and Squares Data Set: A set of classified objects. . .

Triangles and Squares Data Set: A set of classified objects. . .

Entropy. . . • 5 triangles • 9 squares • class probabilities. • entropy

Entropy. . . • 5 triangles • 9 squares • class probabilities. • entropy

Entropy reduction by data set partitioning . . . . red Color? green .

Entropy reduction by data set partitioning . . . . red Color? green . yellow . . .

Entropija vrednosti atributa . . . . red Color? green yellow . .

Entropija vrednosti atributa . . . . red Color? green yellow . .

Information Gain . . . . red Color? green yellow . .

Information Gain . . . . red Color? green yellow . .

Information Gain of The Attribute • Attributes – Gain(Color) = 0. 246 – Gain(Outline)

Information Gain of The Attribute • Attributes – Gain(Color) = 0. 246 – Gain(Outline) = 0. 151 – Gain(Dot) = 0. 048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity)

. . . . red Color? green . . yellow . . Gain(Outline) =

. . . . red Color? green . . yellow . . Gain(Outline) = 0. 971 – 0 = 0. 971 bits Gain(Dot) = 0. 971 – 0. 951 = 0. 020 bits

. . . . red Gain(Outline) = 0. 971 – 0. 951 = 0.

. . . . red Gain(Outline) = 0. 971 – 0. 951 = 0. 020 bits Gain(Dot) = 0. 971 – 0 = 0. 971 bits Color? green . . yellow . . solid Outline? dashed . .

. . . . red . yes Dot? Color? . no green . .

. . . . red . yes Dot? Color? . no green . . yellow . . solid Outline? dashed . .

Decision Tree. . . Color red Dot yes triangle yellow green square no square

Decision Tree. . . Color red Dot yes triangle yellow green square no square Outline dashed triangle solid square

Gini Index • Another sensible measure of impurity (i and j are classes) •

Gini Index • Another sensible measure of impurity (i and j are classes) • After applying attribute A, the resulting Gini index is • Gini can be interpreted as expected error rate

Gini Index. . .

Gini Index. . .

Gini Index for Color . . . . red Color? green yellow . .

Gini Index for Color . . . . red Color? green yellow . .

Gain of Gini Index

Gain of Gini Index

Three Impurity Measures • These impurity measures assess the effect of a single attribute

Three Impurity Measures • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly