Introduction to Weka CS 4705 Natural Language Processing












- Slides: 12

Introduction to Weka CS 4705 – Natural Language Processing Thursday, September 28

What is weka? ● java-based Machine Learning Tool ● 3 modes of operation ● – GUI – Command Line – API (not discussed here) To run: – java -Xmx 1024 M -jar ~cs 4705/bin/weka. jar &

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html @relation name @attribute attr. Name {numeric, string, <nominal>, date}. . . @data a, b, c, d, e ● <nominal> : = {class 1, class 2, . . . , class. N}

Example Arff Files ● http: //sourceforge. net/projects/weka ● iris. arff ● cmc. arff

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click 'Explorer' 8. Wait. . . 3. 'Open file. . . ' 9. Right-click on Result list entry 4. Select 'Classify' tab 5. 'Choose' a classifier a. 'Save result buffer' 6. Confirm options b. 'Save model'

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J 48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – Confusion matrix – Save model – Visualization

Running weka from the Command Line ● Running an N-fold cross validation experiment – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -x N Using a predefined test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -T testingdata. arff

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -d output. model Classifying a test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model T testingdata. arff

● Analyzing results – Get predictions from test data ● – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model -T testingdata. arff -p range Then DIY with scripts ● awk and sed will be your friends

● Getting predictions from crossvalidation – “Output Predictions” doesn't cut it. – export CLASSPATH=~cs 4705/bin/: ~cs 4705/bin/weka. jar – java call. Classifier weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff