Bayesian Classifier Review Decision Tree Age Jeff age

Bayesian Classifier

Review: Decision Tree Age? <Jeff, age: 30, income: medium, student: yes, credit: fair> buys_computer ? 31… 40 <=30 >40 YES Credit? Student? no NO yes YES excellent NO fair YES 2

Bayesian Classification u Bayesian classifier vs. decision tree l l u u Decision tree: predict the class label Bayesian classifier: statistical classifier; predict class membership probabilities Based on Bayes theorem ; estimate posterior probability Naïve Bayesian classifier: l l l Simple classifier that assumes attribute independence High speed when applied to large databases Comparable in performance to decision trees 3

Bayes Theorem u u u Let X be a data sample whose class label is unknown Let H i be the hypothesis that X belongs to a particular class Ci P( H i) is class prior probability that X belongs to a particular class Ci l Can be estimated by n i/ n from training data samples l n is the total number of training data samples l n i is the number of training data samples of class Ci Formula of Bayes Theorem 4

u u Age Income Student Credit Buys_computer P 1 31… 40 high no fair no P 2 <=30 high no excellent no P 3 31… 40 high no fair yes P 4 >40 medium no fair yes P 5 >40 low yes fair yes P 6 >40 low yes excellent no P 7 31… 40 low yes excellent yes P 8 <=30 medium no fair no P 9 <=30 low yes fair yes P 10 >40 medium yes fair yes H 1: Buys_computer =yes H 0: Buys_computer =no P(H 1)=6/10 = 0. 6 P(H 0)=4/10 = 0. 4 5

Bayes Theorem u P (H i|X) is class posteriori probability (of H conditioned on X) l l u Probability that data example X belongs to class Ci given the attribute values of X e. g. , given X=(age: 31… 40, income: medium, student: yes, credit: fair), what is the probability X buys computer? To classify means to determine the highest P (H i|X) among all classes C 1, …C m l If P(H 1|X) >P(H 0|X) , then X buys computer l If P(H 0|X) >P(H 1|X) , then X does not buy computer l Calculate P ( H i|X) using the Bayes theorem 6

Bayes Theorem u P( X) is descriptor prior probability of X l l l Probability that observe the attribute values of X Suppose X= (x 1, x 2, …, x n) and they are independent, then P (X) = P (x 1) P (x 2) … P (x n) P (x j)= n j/n, where n j is number of training examples having value x j for attribute Aj n is the total number of training examples Constant for all classes 7

u u u Age Income Student Credit Buys_computer P 1 31… 40 high no fair no P 2 <=30 high no excellent no P 3 31… 40 high no fair yes P 4 >40 medium no fair yes P 5 >40 low yes fair yes P 6 >40 low yes excellent no P 7 31… 40 low yes excellent yes P 8 <=30 medium no fair no P 9 <=30 low yes fair yes P 10 >40 medium yes fair yes X=(age: 31… 40, income: medium, student: yes, credit: fair) P(age =31… 40)=3/10 P(income =medium)=3/10 P(student =yes)=5/10 P(credit =fair)=7/10 P(X)=P(age =31… 40) P(income =medium) P(student =yes) P(credit =fair) =0. 3 0. 5 0. 7 = 0. 0315 8

Bayes Theorem u P (X|H i) is descriptor posterior probability l l l Probability that observe X in class Ci Assume X=(x 1, x 2, …, x n) and they are independent, then P ( X |H i ) = P ( x 1 |H i ) P ( x 2 |H i ) … P ( x n |H i ) P (x j|H i)= n i, j /n i, where n i, j is number of training examples in class Ci having value x j for attribute Aj n i is number of training examples in Ci 9

Age Income Student Credit Buys_computer P 1 31… 40 high no fair no P 2 <=30 high no excellent no P 3 31… 40 high no fair yes P 4 >40 medium no fair yes P 5 >40 low yes fair yes P 6 >40 low yes excellent no P 7 31… 40 low yes excellent yes P 8 <=30 medium no fair no P 9 <=30 low yes fair yes P 10 >40 medium yes fair yes u X= (age: 31… 40, income: medium, student: yes, credit: fair) H 1 = X buys a computer n 1 = 6 , n 11=2, n 21=2, n 31=4, n 41=5, u P(X|H 1)= u u 10

Age Income Student Credit Buys_computer P 1 31… 40 high no fair no P 2 <=30 high no excellent no P 3 31… 40 high no fair yes P 4 >40 medium no fair yes P 5 >40 low yes fair yes P 6 >40 low yes excellent no P 7 31… 40 low yes excellent yes P 8 <=30 medium no fair no P 9 <=30 low yes fair yes P 10 >40 medium yes fair yes u X= (age: 31… 40, income: medium, student: yes, credit: fair) H 0 = X does not buy a computer n 0 = 4 , n 10=1, n 20=1, n 31=1, n 41 = 2, u P(X|H 0)= u u 11

Bayesian Classifier – Basic Equation Class Prior Probability Class Posterior Probability u u u Descriptor Posterior Probability Descriptor Prior Probability To classify means to determine the highest among all classes C 1, …C m P(X) is constant to all classes Only need to compare P(H i)P(X|H i) P ( H i |X ) 12

Weather dataset example X =< rain, hot, high, false> 13

The independence hypothesis… u u … makes computation possible … yields optimal classifiers when satisfied … but is seldom satisfied in practice, as attributes (variables) are often correlated. Attempts to overcome this limitation: l l Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes Decision trees, that reason on one attribute at the time, considering most important attributes first 16