Machine Learning with Weka Cornelia Caragea Thanks to

































- Slides: 33
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides
Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers
WEKA: the software n Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications n Main features: n n Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
WEKA: versions n There are several versions of WEKA: “book version” compatible with description in data mining book n WEKA: “GUI version” adds graphical user interfaces (book version is command-line only) n WEKA: “development version” with lots of improvements n
WEKA: resources n n API Documentation, Tutorials, Source code. WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Weka-related Projects: n n Weka-Parallel - parallel processing for Weka RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…
Weka: web site http: //www. cs. waikato. ac. nz/ml/weka/
WEKA: launching n java -jar weka. jar
Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . .
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . .
Explorer: pre-processing the data n Data can be imported from a file in various formats: ARFF, CSV, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” n WEKA contains filters for: n n n Discretization, normalization, resampling, attribute selection, …
Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers
Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities n Implemented learning schemes include: n n n Decision trees, support vector machines, perceptrons, neural networks, logistic regression, Bayes nets, … “Meta”-classifiers include: n Bagging, boosting, stacking, …
Outline • Machine Learning Software • Preparing Data • Building Classifiers
To Do Try Naïve Bayes and Logistic Regression classifiers on a different Weka dataset n Use various parameters n Try Linear regression n