Machine Learning with Weka Cornelia Caragea Thanks to

  • Slides: 33
Download presentation
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the

Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides

Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers

Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers

WEKA: the software n Machine learning/data mining software written in Java (distributed under the

WEKA: the software n Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications n Main features: n n Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms

WEKA: versions n There are several versions of WEKA: “book version” compatible with description

WEKA: versions n There are several versions of WEKA: “book version” compatible with description in data mining book n WEKA: “GUI version” adds graphical user interfaces (book version is command-line only) n WEKA: “development version” with lots of improvements n

WEKA: resources n n API Documentation, Tutorials, Source code. WEKA mailing list Data Mining:

WEKA: resources n n API Documentation, Tutorials, Source code. WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Weka-related Projects: n n Weka-Parallel - parallel processing for Weka RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…

Weka: web site http: //www. cs. waikato. ac. nz/ml/weka/

Weka: web site http: //www. cs. waikato. ac. nz/ml/weka/

WEKA: launching n java -jar weka. jar

WEKA: launching n java -jar weka. jar

Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers

Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex {

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . .

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex {

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . .

Explorer: pre-processing the data n Data can be imported from a file in various

Explorer: pre-processing the data n Data can be imported from a file in various formats: ARFF, CSV, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” n WEKA contains filters for: n n n Discretization, normalization, resampling, attribute selection, …

Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers

Outline • Weka: A Machine Learning Toolkit • Preparing Data • Building Classifiers

Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities

Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities n Implemented learning schemes include: n n n Decision trees, support vector machines, perceptrons, neural networks, logistic regression, Bayes nets, … “Meta”-classifiers include: n Bagging, boosting, stacking, …

Outline • Machine Learning Software • Preparing Data • Building Classifiers

Outline • Machine Learning Software • Preparing Data • Building Classifiers

To Do Try Naïve Bayes and Logistic Regression classifiers on a different Weka dataset

To Do Try Naïve Bayes and Logistic Regression classifiers on a different Weka dataset n Use various parameters n Try Linear regression n