n Machine Learning with WEKA n WEKA A

  • Slides: 173
Download presentation
n Machine Learning with WEKA n WEKA: A Machine Learning Toolkit The Explorer •

n Machine Learning with WEKA n WEKA: A Machine Learning Toolkit The Explorer • • Eibe Frank • • Department of Computer Science, University of Waikato, New Zealand • n n n Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions

WEKA: the bird Copyright: Martin Kramer (mkramer@wxs. nl) 2021/2/25 University of Waikato 2

WEKA: the bird Copyright: Martin Kramer (mkramer@wxs. nl) 2021/2/25 University of Waikato 2

WEKA: the software n n Machine learning/data mining software written in Java (distributed under

WEKA: the software n n Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: A comprehensive set of data pre-processing tools, learning algorithms and evaluation methods u Graphical user interfaces (incl. data visualization) u Environment for comparing learning algorithms u 2021/2/25 University of Waikato 3

WEKA: versions n There are several versions of WEKA: WEKA 3. 0: “book version”

WEKA: versions n There are several versions of WEKA: WEKA 3. 0: “book version” compatible with description in data mining book u WEKA 3. 2: “GUI version” adds graphical user interfaces (book version is command-line only) u WEKA 3. 3~ WEKA 3. 7 u WEKA 3. 8: the latest stable version u WEKA 3. 9: “development version” with lots of improvements u n This talk is based on the snapshot of WEKA 3. 4 2021/2/25 University of Waikato 4

WEKA only deals with “flat” files @relation heart-disease age sex 63 67 67 38

WEKA only deals with “flat” files @relation heart-disease age sex 63 67 67 38 male female chest_pain_type cholesterol exercise_induced_angina typ_angina asympt non_anginal 233 286 229 ? no yes no class not_present not_present @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 2021/2/25 University of Waikato 5

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex {

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 2021/2/25 University of Waikato 6

2021/2/25 University of Waikato 7

2021/2/25 University of Waikato 7

2021/2/25 University of Waikato 8

2021/2/25 University of Waikato 8

2021/2/25 University of Waikato 9

2021/2/25 University of Waikato 9

2021/2/25 University of Waikato 10

2021/2/25 University of Waikato 10

Explorer: pre-processing the data n n Data can be imported from a file in

Explorer: pre-processing the data n n Data can be imported from a file in various formats: ARFF, CSV, C 4. 5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: u 2021/2/25 Discretization, normalization, attribute selection, resampling, transforming and combining attributes, … University of Waikato 11

2021/2/25 University of Waikato 12

2021/2/25 University of Waikato 12

2 1 3 2021/2/25 University of Waikato open iris. arff data set 13

2 1 3 2021/2/25 University of Waikato open iris. arff data set 13

The iris Dataset (鳶尾花資料集) n Sources (Perhaps the best known dataset in pattern recognition)

The iris Dataset (鳶尾花資料集) n Sources (Perhaps the best known dataset in pattern recognition) u u u n n Creator: R. A. Fisher Donor: Michael Marshall Date: July, 1988 Number of Instances: 150 (50 in each of 3 classes) Number of Input Attributes: 4, numeric Missing Attribute Values: None Attribute Information u u u 2021/2/25 sepal (花萼) length (in cm) sepal width (in cm) petal (花瓣) length (in cm) petal width (in cm) class: iris Setosa, iris Versicolour, iris Virginica (3 classes) 14

2021/2/25 University of Waikato 18

2021/2/25 University of Waikato 18

Save as other file format as needed 2021/2/25 University of Waikato 19

Save as other file format as needed 2021/2/25 University of Waikato 19

2021/2/25 University of Waikato 20

2021/2/25 University of Waikato 20

2021/2/25 University of Waikato 21

2021/2/25 University of Waikato 21

2021/2/25 University of Waikato 22

2021/2/25 University of Waikato 22

2021/2/25 University of Waikato 23

2021/2/25 University of Waikato 23

2021/2/25 University of Waikato 24

2021/2/25 University of Waikato 24

2021/2/25 University of Waikato 25

2021/2/25 University of Waikato 25

2021/2/25 University of Waikato 26

2021/2/25 University of Waikato 26

2021/2/25 University of Waikato 27

2021/2/25 University of Waikato 27

2021/2/25 University of Waikato 28

2021/2/25 University of Waikato 28

2021/2/25 University of Waikato 29

2021/2/25 University of Waikato 29

2021/2/25 University of Waikato 30

2021/2/25 University of Waikato 30

2021/2/25 University of Waikato 31

2021/2/25 University of Waikato 31

2021/2/25 University of Waikato 32

2021/2/25 University of Waikato 32

2021/2/25 University of Waikato 33

2021/2/25 University of Waikato 33

2021/2/25 University of Waikato 34

2021/2/25 University of Waikato 34

2021/2/25 University of Waikato 35

2021/2/25 University of Waikato 35

2021/2/25 University of Waikato 36

2021/2/25 University of Waikato 36

2021/2/25 University of Waikato 37

2021/2/25 University of Waikato 37

Explorer: building “classifiers” n n Classifiers in WEKA are models for predicting nominal or

Explorer: building “classifiers” n n Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: u 2021/2/25 Decision trees, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … University of Waikato 38

2021/2/25 University of Waikato 39

2021/2/25 University of Waikato 39

2 2021/2/25 1 University of Waikato 40

2 2021/2/25 1 University of Waikato 40

1 2 2021/2/25 University of Waikato 41

1 2 2021/2/25 University of Waikato 41

2021/2/25 University of Waikato 42

2021/2/25 University of Waikato 42

2021/2/25 University of Waikato 43

2021/2/25 University of Waikato 43

2021/2/25 University of Waikato 44

2021/2/25 University of Waikato 44

1 2021/2/25 University of Waikato 2 45

1 2021/2/25 University of Waikato 2 45

2021/2/25 University of Waikato 46

2021/2/25 University of Waikato 46

2021/2/25 University of Waikato 47

2021/2/25 University of Waikato 47

2021/2/25 University of Waikato 48

2021/2/25 University of Waikato 48

2021/2/25 University of Waikato 49

2021/2/25 University of Waikato 49

2021/2/25 University of Waikato 50

2021/2/25 University of Waikato 50

2021/2/25 University of Waikato 51

2021/2/25 University of Waikato 51

2021/2/25 University of Waikato 52

2021/2/25 University of Waikato 52

2021/2/25 University of Waikato 53

2021/2/25 University of Waikato 53

2021/2/25 University of Waikato 54

2021/2/25 University of Waikato 54

2021/2/25 University of Waikato 55

2021/2/25 University of Waikato 55

2021/2/25 University of Waikato 56

2021/2/25 University of Waikato 56

Right button 2021/2/25 University of Waikato 57

Right button 2021/2/25 University of Waikato 57

2021/2/25 University of Waikato 58

2021/2/25 University of Waikato 58

2021/2/25 University of Waikato 59

2021/2/25 University of Waikato 59

Right button 2021/2/25 University of Waikato 60

Right button 2021/2/25 University of Waikato 60

2021/2/25 University of Waikato 61

2021/2/25 University of Waikato 61

2021/2/25 University of Waikato 62

2021/2/25 University of Waikato 62

2021/2/25 University of Waikato 63

2021/2/25 University of Waikato 63

1 2 2021/2/25 University of Waikato 64

1 2 2021/2/25 University of Waikato 64

2021/2/25 University of Waikato 65

2021/2/25 University of Waikato 65

2021/2/25 University of Waikato 66

2021/2/25 University of Waikato 66

Node 3 Node 0 Node 4 Node 1 Node 2 Node 5 2021/2/25 University

Node 3 Node 0 Node 4 Node 1 Node 2 Node 5 2021/2/25 University of Waikato 67

1 2021/2/25 University of Waikato 2 68

1 2021/2/25 University of Waikato 2 68

1 2 2021/2/25 University of Waikato 69

1 2 2021/2/25 University of Waikato 69

2 1 2021/2/25 University of Waikato 70

2 1 2021/2/25 University of Waikato 70

1 2 3 2021/2/25 4 University of Waikato 71

1 2 3 2021/2/25 4 University of Waikato 71

2021/2/25 University of Waikato 72

2021/2/25 University of Waikato 72

1 2 2021/2/25 University of Waikato 73

1 2 2021/2/25 University of Waikato 73

2021/2/25 University of Waikato 74

2021/2/25 University of Waikato 74

2021/2/25 University of Waikato 75

2021/2/25 University of Waikato 75

2021/2/25 University of Waikato 76

2021/2/25 University of Waikato 76

Right button 2021/2/25 University of Waikato 77

Right button 2021/2/25 University of Waikato 77

2021/2/25 University of Waikato 78

2021/2/25 University of Waikato 78

ROC curve 2021/2/25 University of Waikato 79

ROC curve 2021/2/25 University of Waikato 79

2021/2/25 University of Waikato 80

2021/2/25 University of Waikato 80

Use a numeric attribute as output 2021/2/25 University of Waikato 81

Use a numeric attribute as output 2021/2/25 University of Waikato 81

2021/2/25 University of Waikato 82

2021/2/25 University of Waikato 82

2021/2/25 University of Waikato 83

2021/2/25 University of Waikato 83

1 2 Use M 5 P to predict petal length of a iris flower

1 2 Use M 5 P to predict petal length of a iris flower M 5 Model trees and rules: Combines a decision tree with linear regression 2021/2/25 University of Waikato 84

3 When to use model 1, 2, 3 Model 1 Model 2 Model 3

3 When to use model 1, 2, 3 Model 1 Model 2 Model 3 1 2 2021/2/25 University of Waikato Generate 3 models 85

Linear Model 1 Linear Model 2 Linear Model 3 2021/2/25 University of Waikato 86

Linear Model 1 Linear Model 2 Linear Model 3 2021/2/25 University of Waikato 86

Performance indices right click “Visualize classifier error” 2021/2/25 University of Waikato 87

Performance indices right click “Visualize classifier error” 2021/2/25 University of Waikato 87

2021/2/25 University of Waikato 88

2021/2/25 University of Waikato 88

Click a data point to show the data window n Color : class n

Click a data point to show the data window n Color : class n Size of X : error extent 2021/2/25 University of Waikato 89

2021/2/25 University of Waikato 90

2021/2/25 University of Waikato 90

2021/2/25 University of Waikato 91

2021/2/25 University of Waikato 91

models 2021/2/25 University of Waikato 92

models 2021/2/25 University of Waikato 92

Explorer: clustering data n n WEKA contains “clusterers” for finding groups of similar instances

Explorer: clustering data n n WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: u n k-Means, EM, Cobweb, X-means, Farthest. First Clusters can be visualized and compared to “true” clusters (if classes are given) u 2021/2/25 Evaluation based on log likelihood if clustering scheme produces a probability distribution University of Waikato 93

2 2021/2/25 1 University of Waikato 94

2 2021/2/25 1 University of Waikato 94

2021/2/25 University of Waikato 95

2021/2/25 University of Waikato 95

2021/2/25 University of Waikato 96

2021/2/25 University of Waikato 96

3 1 2021/2/25 2 University of Waikato 97

3 1 2021/2/25 2 University of Waikato 97

Enter 3 clusters 2021/2/25 University of Waikato 98

Enter 3 clusters 2021/2/25 University of Waikato 98

2021/2/25 University of Waikato 99

2021/2/25 University of Waikato 99

2021/2/25 University of Waikato 100

2021/2/25 University of Waikato 100

2021/2/25 University of Waikato 101

2021/2/25 University of Waikato 101

means of 4 attributes 花瓣最小 花瓣最大 屬性順序 2021/2/25 University of Waikato 花種 102

means of 4 attributes 花瓣最小 花瓣最大 屬性順序 2021/2/25 University of Waikato 花種 102

2021/2/25 Right click University of Waikato 103

2021/2/25 Right click University of Waikato 103

2021/2/25 University of Waikato 104

2021/2/25 University of Waikato 104

花瓣最大 花瓣最小 2021/2/25 University of Waikato 105

花瓣最大 花瓣最小 2021/2/25 University of Waikato 105

Explorer: finding associations n WEKA contains an implementation of the Apriori algorithm for learning

Explorer: finding associations n WEKA contains an implementation of the Apriori algorithm for learning association rules u n Can identify statistical dependencies between groups of attributes: u n Works only with discrete (categorical) data milk, butter bread, eggs (with confidence 0. 9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 2021/2/25 University of Waikato 106

2021/2/25 University of Waikato 107

2021/2/25 University of Waikato 107

Load vote data set 2021/2/25 University of Waikato 108

Load vote data set 2021/2/25 University of Waikato 108

2021/2/25 University of Waikato 109

2021/2/25 University of Waikato 109

2021/2/25 University of Waikato 110

2021/2/25 University of Waikato 110

2021/2/25 University of Waikato 111

2021/2/25 University of Waikato 111

Expand the Window 2021/2/25 University of Waikato 112

Expand the Window 2021/2/25 University of Waikato 112

minimum support = 0. 45 minimum confidence = 0. 9 2 Change parameters item

minimum support = 0. 45 minimum confidence = 0. 9 2 Change parameters item support count rule confidence Default : 10 rules 2021/2/25 University of Waikato 113

minimum support minimum confidence Set number of rules=15 2021/2/25 University of Waikato 114

minimum support minimum confidence Set number of rules=15 2021/2/25 University of Waikato 114

15 rules 2021/2/25 University of Waikato 115

15 rules 2021/2/25 University of Waikato 115

Explorer: attribute selection n n Panel that can be used to investigate which (subsets

Explorer: attribute selection n n Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking u An evaluation method: correlation-based, wrapper, information gain, chi-squared, … u n Very flexible: WEKA allows (almost) arbitrary combinations of these two 2021/2/25 University of Waikato 116

2021/2/25 University of Waikato 117

2021/2/25 University of Waikato 117

2021/2/25 University of Waikato 118

2021/2/25 University of Waikato 118

2021/2/25 University of Waikato 119

2021/2/25 University of Waikato 119

2021/2/25 University of Waikato 120

2021/2/25 University of Waikato 120

2021/2/25 University of Waikato 121

2021/2/25 University of Waikato 121

2021/2/25 University of Waikato 122

2021/2/25 University of Waikato 122

2021/2/25 University of Waikato 123

2021/2/25 University of Waikato 123

2021/2/25 University of Waikato 124

2021/2/25 University of Waikato 124

Explorer: data visualization n n Visualization very useful in practice: e. g. helps to

Explorer: data visualization n n Visualization very useful in practice: e. g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1 D) and pairs of attributes (2 D) u n n n To do: rotating 3 D visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 2021/2/25 University of Waikato 125

1 Load Glass data set 2 2021/2/25 University of Waikato 126

1 Load Glass data set 2 2021/2/25 University of Waikato 126

2021/2/25 University of Waikato 127

2021/2/25 University of Waikato 127

2 1 2021/2/25 Change Point. Size University of Waikato 128

2 1 2021/2/25 Change Point. Size University of Waikato 128

2021/2/25 University of Waikato 129

2021/2/25 University of Waikato 129

2 1 2021/2/25 Change Plot. Size University of Waikato 130

2 1 2021/2/25 Change Plot. Size University of Waikato 130

Plot. Size is changed 2021/2/25 University of Waikato 131

Plot. Size is changed 2021/2/25 University of Waikato 131

Double click to enlarge 2021/2/25 University of Waikato 132

Double click to enlarge 2021/2/25 University of Waikato 132

2021/2/25 University of Waikato 133

2021/2/25 University of Waikato 133

2021/2/25 University of Waikato 134

2021/2/25 University of Waikato 134

2 Zoom-in 2021/2/25 University of Waikato 135

2 Zoom-in 2021/2/25 University of Waikato 135

2021/2/25 University of Waikato 136

2021/2/25 University of Waikato 136

2021/2/25 University of Waikato 137

2021/2/25 University of Waikato 137

2021/2/25 University of Waikato 138

2021/2/25 University of Waikato 138

Performing experiments n n n Experimenter makes it easy to compare the performance of

Performing experiments n n n Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in! 2021/2/25 University of Waikato 139

2021/2/25 University of Waikato 140

2021/2/25 University of Waikato 140

2021/2/25 University of Waikato 141

2021/2/25 University of Waikato 141

1. Add a few data sets 2021/2/25 2. Add a few algorithms University of

1. Add a few data sets 2021/2/25 2. Add a few algorithms University of Waikato 142

2021/2/25 University of Waikato 143

2021/2/25 University of Waikato 143

2021/2/25 University of Waikato 144

2021/2/25 University of Waikato 144

Result message Running status 2021/2/25 University of Waikato 145

Result message Running status 2021/2/25 University of Waikato 145

2021/2/25 University of Waikato 146

2021/2/25 University of Waikato 146

2021/2/25 University of Waikato 147

2021/2/25 University of Waikato 147

2021/2/25 University of Waikato 148

2021/2/25 University of Waikato 148

2021/2/25 University of Waikato 149

2021/2/25 University of Waikato 149

2021/2/25 University of Waikato 150

2021/2/25 University of Waikato 150

2021/2/25 University of Waikato 151

2021/2/25 University of Waikato 151

The Knowledge Flow GUI n n n New graphical user interface for WEKA Java-Beans-based

The Knowledge Flow GUI n n n New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e. g. , “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later 2021/2/25 University of Waikato 152

2021/2/25 University of Waikato 153

2021/2/25 University of Waikato 153

2021/2/25 University of Waikato 154

2021/2/25 University of Waikato 154

2021/2/25 University of Waikato 155

2021/2/25 University of Waikato 155

2021/2/25 University of Waikato 156

2021/2/25 University of Waikato 156

2021/2/25 University of Waikato 157

2021/2/25 University of Waikato 157

2021/2/25 University of Waikato 158

2021/2/25 University of Waikato 158

2021/2/25 University of Waikato 159

2021/2/25 University of Waikato 159

2021/2/25 University of Waikato 160

2021/2/25 University of Waikato 160

2021/2/25 University of Waikato 161

2021/2/25 University of Waikato 161

2021/2/25 University of Waikato 162

2021/2/25 University of Waikato 162

2021/2/25 University of Waikato 163

2021/2/25 University of Waikato 163

2021/2/25 University of Waikato 164

2021/2/25 University of Waikato 164

2021/2/25 University of Waikato 165

2021/2/25 University of Waikato 165

2021/2/25 University of Waikato 166

2021/2/25 University of Waikato 166

2021/2/25 University of Waikato 167

2021/2/25 University of Waikato 167

2021/2/25 University of Waikato 168

2021/2/25 University of Waikato 168

2021/2/25 University of Waikato 169

2021/2/25 University of Waikato 169

2021/2/25 University of Waikato 170

2021/2/25 University of Waikato 170

2021/2/25 University of Waikato 171

2021/2/25 University of Waikato 171

2021/2/25 University of Waikato 172

2021/2/25 University of Waikato 172

Conclusion: try it yourself! n § § WEKA is available at http: //www. cs.

Conclusion: try it yourself! n § § WEKA is available at http: //www. cs. waikato. ac. nz/ml/weka Also has a list of projects based on WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank , Gabi Schmidberger , Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall , Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 2021/2/25 University of Waikato 173