Introduction to Weka CS 4705 Natural Language Processing

  • Slides: 12
Download presentation
Introduction to Weka CS 4705 – Natural Language Processing Thursday, September 28

Introduction to Weka CS 4705 – Natural Language Processing Thursday, September 28

What is weka? ● java-based Machine Learning Tool ● 3 modes of operation ●

What is weka? ● java-based Machine Learning Tool ● 3 modes of operation ● – GUI – Command Line – API (not discussed here) To run: – java -Xmx 1024 M -jar ~cs 4705/bin/weka. jar &

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html @relation name

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html @relation name @attribute attr. Name {numeric, string, <nominal>, date}. . . @data a, b, c, d, e ● <nominal> : = {class 1, class 2, . . . , class. N}

Example Arff Files ● http: //sourceforge. net/projects/weka ● iris. arff ● cmc. arff

Example Arff Files ● http: //sourceforge. net/projects/weka ● iris. arff ● cmc. arff

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click 'Explorer' 8. Wait. . . 3. 'Open file. . . ' 9. Right-click on Result list entry 4. Select 'Classify' tab 5. 'Choose' a classifier a. 'Save result buffer' 6. Confirm options b. 'Save model'

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J 48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances”

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – Confusion matrix – Save model – Visualization

Running weka from the Command Line ● Running an N-fold cross validation experiment –

Running weka from the Command Line ● Running an N-fold cross validation experiment – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -x N Using a predefined test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -T testingdata. arff

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes.

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -d output. model Classifying a test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model T testingdata. arff

● Analyzing results – Get predictions from test data ● – java -cp ~cs

● Analyzing results – Get predictions from test data ● – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model -T testingdata. arff -p range Then DIY with scripts ● awk and sed will be your friends

● Getting predictions from crossvalidation – “Output Predictions” doesn't cut it. – export CLASSPATH=~cs

● Getting predictions from crossvalidation – “Output Predictions” doesn't cut it. – export CLASSPATH=~cs 4705/bin/: ~cs 4705/bin/weka. jar – java call. Classifier weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff