A Short Introduction to Weka Natural Language Processing

  • Slides: 13
Download presentation
A Short Introduction to Weka Natural Language Processing Thursday, September 25 th

A Short Introduction to Weka Natural Language Processing Thursday, September 25 th

What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3

What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java -Xmx 1024 M -jar ~cs 4705/bin/weka. jar &

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html % 1.

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Irisvirginica} @DATA 5. 1, 3. 5, 1. 4, 0. 2, Iris-setosa

. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a

. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a number § nominal: a (finite) set of strings, e. g. {Iris-setosa, Iris-versicolor, Irisvirginica} § string: <arbitrary strings> § date: (default ISO-8601) yyyy-MMdd’T’HH: mm: ss

Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff

Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff ● weather. arff

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click 'Explorer' 8. Wait. . . 3. 'Open file. . . ' 9. Right-click on Result list entry 4. Select 'Classify' tab 5. 'Choose' a classifier a. 'Save result buffer' 6. Confirm options b. 'Save model'

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J 48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances”

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization

Running weka from the Command Line ● Running an N-fold cross validation experiment –

Running weka from the Command Line ● Running an N-fold cross validation experiment – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -x N -i Using a predefined test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -T testingdata. arff

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes.

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -d output. model Classifying a test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model -T testingdata. arff ● Getting help – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -?

Homework 2 Weka Workflow … T 1 Your Feature Extractor S 1 S 2

Homework 2 Weka Workflow … T 1 Your Feature Extractor S 1 S 2 … TN Your Feature Extractor . arff Weka best model Test. arff SN results Preprocessing (you) Experimentation (you) Weka results Grading (us)

Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your

Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.