A new feature selection based on comprehensive measurement
A new feature selection based on comprehensive measurement both in inter -category and intra-category for text categorization Presenter : Bo-Sheng Wang Authors : Jieming Yanga, b, Yuanning Liua, Xiaodong Zhua, *, Zhen Liua, c, Xiaoxu Zhanga IPM, 2012 1
Outlines • • • 2 Motivation Objectives Methodology Experiments Conclusions Comments
Motivation 3
Objectives • They proposed a new feature selection algorithm, named CMFS. – It comprehensively measures the significance of a term both in inter-category and intra-category. 4
Methodology 5
Methodology. Algorithm 6
Experiments • Classification algorithm – Naïve Bayes classifier – Support Vector Machines • Dataset – 20 -Newsgroups – Reuters-21578 – Web. KB • Preprocessing – Converted to lower case, remove punctuation, stop lists were use, no stemming was used. 7
Experiments • we measured the performance of the text categorization in terms of F 1 and Accuracy. 8
Experiments • 20 -Newsgroups 9
Experiments 10
Experiments • Reuters-21578 11
Experiments 12
Experiments • Web. KB 13
Experiments 14
Experiments • Statistical 15
Conclusions • According to above experiment, we can know CMFS better than other feature selection algorithm. 16
Comments • Advantages – It have high accuracy and performance. • Applications – Feature selection – Text categorization 17
- Slides: 17