Text Classification An Implementation Project Prerak Sanghvi Computer

  • Slides: 7
Download presentation
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University

Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo

Algorithm to be used • I intend to use a Back. Propagation Artificial Neural

Algorithm to be used • I intend to use a Back. Propagation Artificial Neural Network • Inputs are in terms of whether a particular keyword is present or not in a document • Output is in terms of the category into which the document should be classified

What are the keywords? • This falls under a broader class of problems, known

What are the keywords? • This falls under a broader class of problems, known as Feature Selection. • Some technique in Feature Selection will be used to automatically or semi-automatically pick the keywords.

Organization of the Project • The project will really consist of two phases, each

Organization of the Project • The project will really consist of two phases, each of which is equally important for good results: – Feature Selection – Implementation of the ANN

Artificial Neural Network keywords Hidden layer classification k 1 k 2 k 3 kn

Artificial Neural Network keywords Hidden layer classification k 1 k 2 k 3 kn

Example data set • One of the several corpora available on the web will

Example data set • One of the several corpora available on the web will be used

After ANN • Once the technique to extract the feature set from the data

After ANN • Once the technique to extract the feature set from the data set is implemented, any algorithm can be used to make the classification. • After ANN is successfully implemented, other algorithms, especially Naïve Bayes classification method will be implemented. • Comparison of results from different methods will be compared. • Another possibility is the coupling of two methods to improve the overall performance.