National Yunlin University of Science and Technology Some

  • Slides: 14
Download presentation
國立雲林科技大學 National Yunlin University of Science and Technology Some Effective Techniques for Naive Bayes

國立雲林科技大學 National Yunlin University of Science and Technology Some Effective Techniques for Naive Bayes Text Classification Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng 2006. TKDE. Page(s) : 1457 - 1466 1 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Outline n n Motivation Objective About Naïve

N. Y. U. S. T. I. M. Outline n n Motivation Objective About Naïve Bayes Method Ø Ø n n n A per-document length normalization approach Weight-enhancing method Experimental Result Conclusion Personal Opinions 2 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Motivation · While naïve Bayes is quite

N. Y. U. S. T. I. M. Motivation · While naïve Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. · Based on the observation of naïve Bayes for the natural language text, we found a serious problem in the parameter estimation process, which cause poor results in text classification domain. 3 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Objective · We hope to propose some

N. Y. U. S. T. I. M. Objective · We hope to propose some methods that can improve these problems. 4 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. About Naive Bayes · Multivariate Bernoulli naïve

N. Y. U. S. T. I. M. About Naive Bayes · Multivariate Bernoulli naïve Bayes A document is considered as a binary feature vector representing whether each word is present or absent. It is not equipped to utilize term frequencies in documents. Multinomial model Two serious problems: (1) rough parameter estimation (2) handling rare categories 5 Intelligent Database Systems Lab

About Naive Bayes N. Y. U. S. T. I. M. 6 Intelligent Database Systems

About Naive Bayes N. Y. U. S. T. I. M. 6 Intelligent Database Systems Lab

Method ─ Multivariate Poisson Model for Text Classification N. Y. U. S. T. I.

Method ─ Multivariate Poisson Model for Text Classification N. Y. U. S. T. I. M. λ表示某特定區間內某事件 所發生的平均次數 7 Intelligent Database Systems Lab

Method ─ A per-document length normalization approach 8 Intelligent Database Systems Lab N. Y.

Method ─ A per-document length normalization approach 8 Intelligent Database Systems Lab N. Y. U. S. T. I. M.

Method ─ Feature Weighting Scheme 9 Intelligent Database Systems Lab N. Y. U. S.

Method ─ Feature Weighting Scheme 9 Intelligent Database Systems Lab N. Y. U. S. T. I. M.

Experimental Results N. Y. U. S. T. I. M. DS 1: Reuters 21578 (consists

Experimental Results N. Y. U. S. T. I. M. DS 1: Reuters 21578 (consists of 21, 578 news articles) DS 2: 20 Newsgroups (consists of 19, 997 Usenet articles collected from 20 different newsgroups) 10 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Experimental Results high 11 Intelligent Database Systems

N. Y. U. S. T. I. M. Experimental Results high 11 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Experimental Results 12 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Experimental Results 12 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Conclusion · We propose a Poisson naive

N. Y. U. S. T. I. M. Conclusion · We propose a Poisson naive Bayes text classification model with weight-enhancing method. · We suggest per-document term frequency normalization to estimate the Poisson parameter, while the traditional multinomial classifier estimates its parameters by considering all the training documents as a unique huge training document. 13 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Personal Opinions n Advantage Ø n Drawback

N. Y. U. S. T. I. M. Personal Opinions n Advantage Ø n Drawback Ø n … … Application Ø Text classification… 14 Intelligent Database Systems Lab