Classification of Arabic Documents Project Final Presentation Dec
Classification of Arabic Documents Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan Proj. Arabic Team Ahmed Elbery
Outline �Arabic documents classification: Motivation �Arabic documents classification: Challenges �Model Details �Results and Evaluation
Arabic documents classification: Motivation �Rich set of Arabic documents �Now > 65 M Internet users of Arabic �Arabic NLP needed for increasing Arabic internet content
Arabic documents classification: Challenges �Techniques built for English language processing may not apply to Arabic because: �Arabic is very rich with complex morphology � Arabic has a very different and difficult syntax and grammar
Project model Classification Stems Feature Extractor Tokenizer Tokens Stemmers Preprocessing Arabic Documents Naive Bayes Top Terms k-Nearest Neighbors Support-vector machines Decision tree Classification Result
Data Set 100 Docs Arabic Spring 50 Docs Politics 50 Docs Violence
Tokenizer Tokens Stemmers Stems Feature Extractor Preprocessing Top Terms
Tokenizer Tokens Stemmers Stems Feature Extractor Preprocessing Top Terms
Tokenizer Tokens Stemmers Stems Feature Extractor Preprocessing Top Terms
Example Doc P-1 : Doc P-2 Doc V-1 Systems Politics nation area Liberty International Politics Government Politics Systems nation area Kill nation Politics Government Violence Systems Weapon Militias Violence Kill Government Burn Systems Weapon Militias Violence Kill Government
Example- Cont.
Example- Cont.
Example- Cont.
Preprocessing The output matrix term 1 Doc 1 D 0 c 2 Doc 3. . … term 2 term 3 ……. tf-idf values Class
Classifier Training Set Classification Algorithm Test Set Classifier (Model) Doc Class 1 P 2 V 3 V . . … …
Results and Evaluation Accuracy � 100 Docs (50+50) � 10 times � 80% training � 20% test Accuracy
Results and Evaluation Accuracy Correlation coefficient
Results and Evaluation Av. Accuracy
Results and Evaluation Time Av. Time
Future work �Test the different parameters of the classifier �Feature ratio �Feature selection parameters �Classifier parameters. �Statistically analysis the results.
Ahmed Elbery
- Slides: 21