A Short Introduction to Weka Natural Language Processing

  • Slides: 13
Download presentation
A Short Introduction to Weka Natural Language Processing Thursday, November 5 th

A Short Introduction to Weka Natural Language Processing Thursday, November 5 th

What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3

What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java -Xmx 1024 M -jar ~cs 4705/bin/weka. jar &

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html % 1.

. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica} @DATA 5. 1, 3. 5, 1. 4, 0. 2, Iris-setosa 4. 9, 3. 0, 1. 4, 0. 2, Iris-setosa 4. 7, 3. 2, 1. 3, 0. 2, Iris-setosa …

. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a

. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a number § nominal: a (finite) set of strings, e. g. {Iris-setosa, Iris-versicolor, Irisvirginica} § string: <arbitrary strings> § date: (default ISO-8601) yyyy-MMdd’T’HH: mm: ss

Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff

Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff ● weather. arff

To Classify with weka GUI 1. Run weka GUI 1. (in Unix: java –jar

To Classify with weka GUI 1. Run weka GUI 1. (in Unix: java –jar weka. jar) 2. Click 'Explorer' 3. 'Open file. . . ' 7. Click 'Start' 8. Wait. . . 9. Right-click on Result list entry 4. Select 'Classify' tab a. 'Save result buffer' 5. 'Choose' a classifier b. 'Save model' 6. Confirm options

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J 48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

Analyzing Results ● Important tools for Homework 3 – Accuracy ● “Correctly classified instances”

Analyzing Results ● Important tools for Homework 3 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization

Running weka from the Command Line ● ● http: //weka. wikispaces. com/Primer Running an

Running weka from the Command Line ● ● http: //weka. wikispaces. com/Primer Running an N-fold cross validation experiment – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -x N -i Using a predefined test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -T testingdata. arff

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes.

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -d output. model Classifying a test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model -T testingdata. arff ● Getting help – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -?

Homework 3 Weka Workflow … T 1 Your Feature Extractor S 1 S 2

Homework 3 Weka Workflow … T 1 Your Feature Extractor S 1 S 2 … TN Your Feature Extractor . arff Weka best model Test. arff SN results Preprocessing (you) Experimentation (you) Weka results Grading (us)

Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your

Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.