1 Classification Nave Bayes Business Intelligence 2 Nave
1 Classification Naïve Bayes Business Intelligence
2 Naïve Bayes: The concept • Bayes Theorem is used for conditional probability calculation in presence of some information. • The conditional probability is typically of the following form: Pr(C|X 1, X 2, X 3, etc. ) = [Pr(X 1, X 2, X 3, etc. |C)*Pr(X)]/[Pr(X 1, X 2, X 3, etc. |C)*Pr(C)+ Pr(X 1, X 2, X 3, etc. |C bar)*Pr(C bar)] Where, Pr(C|X 1, X 2, X 3, etc. ) means probability of event C in the presence of condition/information X 1, X 2, X 3, etc. ) and C bar is the complement event of C. Example: Let C denotes the event that 405 will be moving really slow without any prior information. We can estimate that from our prior experience and put a value of 30%. Let now X 1, X 2, and X 3 denote the given information that it is raining, there is an accident, and one of the left lane is closed, then clearly, the probability of C given this new information will change dramatically. Bayes theorem provides a precise way to calculate this.
Naïve Bayes contd. . • However, with many conditional information present, calculation of posterior probability with Bayes can be very involved. • We then use a simplified version of Bayes Theorem based on the independence of the conditional probabilities. • In this case we use the formula • P(C|X 1, X 2, X 3, etc. ) = P(X 1|C)*P(X 2|C)*P(X 3|C)etc. *P(C)/ P(X 1|C)*P(X 2|C)*P(X 3|C)etc. *P(C)+ P(X 1|C bar)*P(X 2|C bar)*P(X 3|C bar), etc. * P(C bar)] 3
Naïve Bayes: Example • We have a list with the following information about the size of a company, their audit status, and if there were filed charged against them. 4
Naïve Bayes • If we want to know the probability that a company will be fraudulent given it is small in size and there is a charge filed against it or, P(fraudulent|size = small, charges=y) • From the crosstab/pivot tables we can see that the above probability = ½ (there are 2 companies that are small and have charges filed against them and 1 of them is fraudulent). • Similarly, P(fraudulent|small, no) = 0/3 = 0 • P(fraudulent|large, y) = 2/2 = 1 • P(fraudulent|large. N) = 1/3 = 0. 33 • Using Naïve Bayes we can get the following: • P(fraudulent|small, y) = P(small|fraudulent)*P(y|fraudulent)*P(fraudulent)/[P(small|fraud ulent)*P(y|fraudulent)*P(fraudulent)+ P(small|truthful)*P(y|truthful)*P(truthfult)] • = (1/4)*(3/4)*(4/10)/[(1/4)*(3/4)*(4/10)+(4/6)*(1/6)*(6/10)] • = 0. 53 and is very close to the 0. 5 value that we had from exact calculation! 5
6 Naïve Bayes: Flight Delay Example • Let us use XLMiner to create the conditional probabilities (each of the individual probability items) for flight delay. • We will use only – Carrier, Day of the week, Dep Time in one hour block, Destination, Origin, and Weather • Run XLMiner and look at the conditional probability of the training set • For any record in the validation set, the probability for classification is computed by multiplying the corresponding conditional probabilities and the prior probability of that particular class • Let us do two examples
Examples • Example 1: Record Details (row 633 in …NNBforlecture. xlsx) • Multiply all the relevant conditional probabilities for ontime and get p 1 • Multiply all the relevant conditional probabilities for delayed and get p 2 • Weigh each one of them with the corresponding prior class probabilities and add the two numbers (w 1*p 1 + w 2*p 2) • Probability for class i = wipi/(w 1*p 1 + w 2*p 2) • Classify based on if the above prob > cut-off • Example 2: Record Details (row 610 in …NNBforlecture. xlsx) 7
• • Details Record 1. Let us list the conditions. CARRIER DH DEP_TIME 1640 DEST JFK DISTANCE 213 ORIGIN DCA Weather 0 DAY_WEEK 4 Corresponding conditional probabilities for ontime are (I used a vlookup from the conditional probability tables given by XLMiner) extracted from the ontime side. Title Condition CARRIER DH DEP_TIME 1640 DEST JFK DISTANCE 213 ORIGIN DCA Weather 0 DAY_WEEK 4 • • Ontime Conditional Prob 0. 243192 0. 004695 0. 176526 0. 187793 0. 635681 1. 000000 0. 159624 Calculate p 1 or by multiplying the numbers above. For p 1*w 1, mutiply the number below with 0. 80620 p 1 = 3. 84059 E-06 Day of the week is not listed to save space 8
Details • By following the exact same methods and subsequent calculations for the p 1*w 1 and p 2*w 2 we can easily get the following results. conditional probability 0. 804686121 0. 195313879 sum pi*wi 3. 09631 E-06 7. 51538 E-07 pi 3. 84059 E-06 3. 87805 E-06 Ontime • 3. 84785 E-06 Delayed Verify the results for record 2. Day of the week is not listed to save space 9
10 Notes • Quite simple and useful • Better than exact Bayes approach because all combinations may not be present in the data (Exact Bayes will fail as there will be no conditional probability for that particular combination) • However, dependent on data and thus can give erroneous results for small data set • If an association makes sense, but is not present then the classification scheme will not work – Example: Yatch owners may be target for high value life insurance. However, collected data has no incidence of high value life insurance! • Next: Other classification schemes!
- Slides: 10