A Short Introduction to Weka Natural Language Processing

  • Slides: 13
Download presentation
A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and

A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg

What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3

What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’

Homework 2 Weka Workflow … T 1 Your Feature Extractor S 1 S 2

Homework 2 Weka Workflow … T 1 Your Feature Extractor S 1 S 2 … TN Your Feature Extractor . arff Weka best model Test. arff SN results Preprocessing (you) Experimentation (you) Weka results Grading (us)

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java

weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java -Xmx 1024 M -jar ~cs 4705/bin/weka. jar &

. arff file format http: //www. cs. waikato. ac. nz/~ml/weka/arff. html ● % 1.

. arff file format http: //www. cs. waikato. ac. nz/~ml/weka/arff. html ● % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica} @DATA 5. 1, 3. 5, 1. 4, 0. 2, Iris-setosa 4. 9, 3. 0, 1. 4, 0. 2, Iris-setosa 4. 7, 3. 2, 1. 3, 0. 2, Iris-setosa …

. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a

. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a number § nominal: a (finite) set of strings, e. g. {Iris-setosa, Iris-versicolor, Iris-virginica} § string: <arbitrary strings> § date: (default ISO-8601) yyyy-MM-dd’T’HH: mm: ss

Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff

Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff ● weather. arff

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click

To Classify with weka GUI 1. Run weka GUI 7. Click 'Start' 2. Click 'Explorer' 8. Wait. . . 3. 'Open file. . . ' 9. Right-click on Result list entry 4. Select 'Classify' tab 5. 'Choose' a classifier a. 'Save result buffer' 6. Confirm options b. 'Save model'

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J

Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J 48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances”

Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization

Running weka from the Command Line ● Running an N-fold cross validation experiment –

Running weka from the Command Line ● Running an N-fold cross validation experiment – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -x N -i Using a predefined test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -T testingdata. arff

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes.

● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -d output. model Classifying a test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model -T testingdata. arff ● Getting help – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -?

Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your

Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.