n Machine Learning with WEKA n WEKA A













































































































































































- Slides: 173
n Machine Learning with WEKA n WEKA: A Machine Learning Toolkit The Explorer • • Eibe Frank • • Department of Computer Science, University of Waikato, New Zealand • n n n Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions
WEKA: the bird Copyright: Martin Kramer (mkramer@wxs. nl) 12/18/2021 University of Waikato 2
WEKA: the software n n Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods u Graphical user interfaces (incl. data visualization) u Environment for comparing learning algorithms u 12/18/2021 University of Waikato 3
WEKA: versions n There are several versions of WEKA: WEKA 3. 0: “book version” compatible with description in data mining book u WEKA 3. 2: “GUI version” adds graphical user interfaces (book version is command-line only) u WEKA 3. 3: “development version” with lots of improvements u n This talk is based on the latest snapshot of WEKA 3. 3 (soon to be WEKA 3. 4) 12/18/2021 University of Waikato 4
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 12/18/2021 University of Waikato 5
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 12/18/2021 University of Waikato 6
12/18/2021 University of Waikato 7
12/18/2021 University of Waikato 8
12/18/2021 University of Waikato 9
Explorer: pre-processing the data n n Data can be imported from a file in various formats: ARFF, CSV, C 4. 5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: u Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 12/18/2021 University of Waikato 10
12/18/2021 University of Waikato 11
12/18/2021 University of Waikato 12
12/18/2021 University of Waikato 13
12/18/2021 University of Waikato 14
12/18/2021 University of Waikato 15
12/18/2021 University of Waikato 16
12/18/2021 University of Waikato 17
12/18/2021 University of Waikato 18
12/18/2021 University of Waikato 19
12/18/2021 University of Waikato 20
12/18/2021 University of Waikato 21
12/18/2021 University of Waikato 22
12/18/2021 University of Waikato 23
12/18/2021 University of Waikato 24
12/18/2021 University of Waikato 25
12/18/2021 University of Waikato 26
12/18/2021 University of Waikato 27
12/18/2021 University of Waikato 28
12/18/2021 University of Waikato 29
12/18/2021 University of Waikato 30
12/18/2021 University of Waikato 31
Explorer: building “classifiers” n n Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: u n Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: u Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … 12/18/2021 University of Waikato 32
12/18/2021 University of Waikato 33
12/18/2021 University of Waikato 34
12/18/2021 University of Waikato 35
12/18/2021 University of Waikato 36
12/18/2021 University of Waikato 37
12/18/2021 University of Waikato 38
12/18/2021 University of Waikato 39
12/18/2021 University of Waikato 40
12/18/2021 University of Waikato 41
12/18/2021 University of Waikato 42
12/18/2021 University of Waikato 43
12/18/2021 University of Waikato 44
12/18/2021 University of Waikato 45
12/18/2021 University of Waikato 46
12/18/2021 University of Waikato 47
12/18/2021 University of Waikato 48
12/18/2021 University of Waikato 49
12/18/2021 University of Waikato 50
12/18/2021 University of Waikato 51
12/18/2021 University of Waikato 52
12/18/2021 University of Waikato 53
12/18/2021 University of Waikato 54
12/18/2021 University of Waikato 55
12/18/2021 University of Waikato 56
12/18/2021 University of Waikato 57
12/18/2021 University of Waikato 58
12/18/2021 University of Waikato 59
12/18/2021 University of Waikato 60
12/18/2021 University of Waikato 61
12/18/2021 University of Waikato 62
12/18/2021 University of Waikato 63
12/18/2021 University of Waikato 64
12/18/2021 University of Waikato 65
12/18/2021 University of Waikato 66
12/18/2021 University of Waikato 67
12/18/2021 University of Waikato 68
12/18/2021 University of Waikato 69
12/18/2021 University of Waikato 70
12/18/2021 University of Waikato 71
12/18/2021 University of Waikato 72
12/18/2021 University of Waikato 73
12/18/2021 University of Waikato 74
12/18/2021 University of Waikato 75
12/18/2021 University of Waikato 76
12/18/2021 University of Waikato 77
12/18/2021 University of Waikato 78
12/18/2021 University of Waikato 79
12/18/2021 University of Waikato 80
12/18/2021 University of Waikato 81
12/18/2021 University of Waikato 82
12/18/2021 University of Waikato 83
12/18/2021 University of Waikato 84
12/18/2021 University of Waikato 85
12/18/2021 University of Waikato 86
12/18/2021 University of Waikato 87
12/18/2021 University of Waikato 88
12/18/2021 University of Waikato 89
12/18/2021 University of Waikato 90
12/18/2021 University of Waikato 91
Explorer: clustering data n n WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: u n n k-Means, EM, Cobweb, X-means, Farthest. First Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 12/18/2021 University of Waikato 92
12/18/2021 University of Waikato 93
12/18/2021 University of Waikato 94
12/18/2021 University of Waikato 95
12/18/2021 University of Waikato 96
12/18/2021 University of Waikato 97
12/18/2021 University of Waikato 98
12/18/2021 University of Waikato 99
12/18/2021 University of Waikato 100
12/18/2021 University of Waikato 101
12/18/2021 University of Waikato 102
12/18/2021 University of Waikato 103
12/18/2021 University of Waikato 104
12/18/2021 University of Waikato 105
12/18/2021 University of Waikato 106
12/18/2021 University of Waikato 107
Explorer: finding associations n WEKA contains an implementation of the Apriori algorithm for learning association rules u n Can identify statistical dependencies between groups of attributes: u n Works only with discrete data milk, butter bread, eggs (with confidence 0. 9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 12/18/2021 University of Waikato 108
12/18/2021 University of Waikato 109
12/18/2021 University of Waikato 110
12/18/2021 University of Waikato 111
12/18/2021 University of Waikato 112
12/18/2021 University of Waikato 113
12/18/2021 University of Waikato 114
12/18/2021 University of Waikato 115
Explorer: attribute selection n n Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking u An evaluation method: correlation-based, wrapper, information gain, chi-squared, … u n Very flexible: WEKA allows (almost) arbitrary combinations of these two 12/18/2021 University of Waikato 116
12/18/2021 University of Waikato 117
12/18/2021 University of Waikato 118
12/18/2021 University of Waikato 119
12/18/2021 University of Waikato 120
12/18/2021 University of Waikato 121
12/18/2021 University of Waikato 122
12/18/2021 University of Waikato 123
12/18/2021 University of Waikato 124
Explorer: data visualization n n Visualization very useful in practice: e. g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1 -d) and pairs of attributes (2 -d) u n n n To do: rotating 3 -d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 12/18/2021 University of Waikato 125
12/18/2021 University of Waikato 126
12/18/2021 University of Waikato 127
12/18/2021 University of Waikato 128
12/18/2021 University of Waikato 129
12/18/2021 University of Waikato 130
12/18/2021 University of Waikato 131
12/18/2021 University of Waikato 132
12/18/2021 University of Waikato 133
12/18/2021 University of Waikato 134
12/18/2021 University of Waikato 135
12/18/2021 University of Waikato 136
12/18/2021 University of Waikato 137
Performing experiments n n n Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in! 12/18/2021 University of Waikato 138
12/18/2021 University of Waikato 139
12/18/2021 University of Waikato 140
12/18/2021 University of Waikato 141
12/18/2021 University of Waikato 142
12/18/2021 University of Waikato 143
12/18/2021 University of Waikato 144
12/18/2021 University of Waikato 145
12/18/2021 University of Waikato 146
12/18/2021 University of Waikato 147
12/18/2021 University of Waikato 148
12/18/2021 University of Waikato 149
12/18/2021 University of Waikato 150
12/18/2021 University of Waikato 151
The Knowledge Flow GUI n n n New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e. g. , “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later 12/18/2021 University of Waikato 152
12/18/2021 University of Waikato 153
12/18/2021 University of Waikato 154
12/18/2021 University of Waikato 155
12/18/2021 University of Waikato 156
12/18/2021 University of Waikato 157
12/18/2021 University of Waikato 158
12/18/2021 University of Waikato 159
12/18/2021 University of Waikato 160
12/18/2021 University of Waikato 161
12/18/2021 University of Waikato 162
12/18/2021 University of Waikato 163
12/18/2021 University of Waikato 164
12/18/2021 University of Waikato 165
12/18/2021 University of Waikato 166
12/18/2021 University of Waikato 167
12/18/2021 University of Waikato 168
12/18/2021 University of Waikato 169
12/18/2021 University of Waikato 170
12/18/2021 University of Waikato 171
12/18/2021 University of Waikato 172
Conclusion: try it yourself! n § § WEKA is available at http: //www. cs. waikato. ac. nz/ml/weka Also has a list of projects based on WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank , Gabi Schmidberger , Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall , Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 12/18/2021 University of Waikato 173