A Short Introduction to Weka Natural Language Processing
- Slides: 13
A Short Introduction to Weka Natural Language Processing Thursday, November 5 th
What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’
weka Homepage ● http: //www. cs. waikato. ac. nz/ml/weka/ ● To run: – java -Xmx 1024 M -jar ~cs 4705/bin/weka. jar &
. arff file format ● http: //www. cs. waikato. ac. nz/~ml/weka/arff. html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica} @DATA 5. 1, 3. 5, 1. 4, 0. 2, Iris-setosa 4. 9, 3. 0, 1. 4, 0. 2, Iris-setosa 4. 7, 3. 2, 1. 3, 0. 2, Iris-setosa …
. arff file format @attribute attr. Name {numeric, string, <nominal>, date} § numeric: a number § nominal: a (finite) set of strings, e. g. {Iris-setosa, Iris-versicolor, Irisvirginica} § string: <arbitrary strings> § date: (default ISO-8601) yyyy-MMdd’T’HH: mm: ss
Example Arff Files ● ~cs 4705/bin/weka-3 -4 -11/data/ ● iris. arff ● soybean. arff ● weather. arff
To Classify with weka GUI 1. Run weka GUI 1. (in Unix: java –jar weka. jar) 2. Click 'Explorer' 3. 'Open file. . . ' 7. Click 'Start' 8. Wait. . . 9. Right-click on Result list entry 4. Select 'Classify' tab a. 'Save result buffer' 5. 'Choose' a classifier b. 'Save model' 6. Confirm options
Classify ● Some classifiers to start with. – Naive. Bayes – JRip – J 48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!
Analyzing Results ● Important tools for Homework 3 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization
Running weka from the Command Line ● ● http: //weka. wikispaces. com/Primer Running an N-fold cross validation experiment – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -x N -i Using a predefined test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -T testingdata. arff
● Saving the model – ● java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -t trainingdata. arff -d output. model Classifying a test set – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -l input. model -T testingdata. arff ● Getting help – java -cp ~cs 4705/bin/weka. jar weka. classifiers. bayes. Naive. Bayes -?
Homework 3 Weka Workflow … T 1 Your Feature Extractor S 1 S 2 … TN Your Feature Extractor . arff Weka best model Test. arff SN results Preprocessing (you) Experimentation (you) Weka results Grading (us)
Tips for Homework Success ● ● Start early Read instructions carefully Start simply Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.
- Long and short
- Weka introduction
- Natural language processing vietnamese
- Probabilistic model natural language processing
- Natural language processing
- Markov chain natural language processing
- Christopher manning stanford
- Pengertian natural language processing
- Discourse analysis nlp
- Nlp lecture notes
- Foundations of statistical natural language processing
- Natural language processing fields
- Statistical nlp
- Natural language processing nlp - theory lecture