n Machine Learning with WEKA n WEKA A

  • Slides: 82
Download presentation
n Machine Learning with WEKA n WEKA: A Machine Learning Toolkit The Explorer •

n Machine Learning with WEKA n WEKA: A Machine Learning Toolkit The Explorer • • Eibe Frank • • Department of Computer Science, University of Waikato, New Zealand • n n n Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Some slides updated 2/22/2020 by Dr. Gary Weiss

WEKA: the bird Copyright: Martin Kramer (mkramer@wxs. nl) 10/29/2020 University of Waikato 2

WEKA: the bird Copyright: Martin Kramer (mkramer@wxs. nl) 10/29/2020 University of Waikato 2

WEKA: the software n n Machine learning/data mining software written in Java (distributed under

WEKA: the software n n Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods u Graphical user interfaces (incl. data visualization) u Environment for comparing learning algorithms u 10/29/2020 University of Waikato 3

WEKA: versions n There are several versions of WEKA: u n As of Feb

WEKA: versions n There are several versions of WEKA: u n As of Feb 2020 the stable version is 3. 8. 4 and that is the one you should be using/ These slides, which are from an old tutorial, are based on WEKA 3. 3 u Dr. Weiss has added some notes for significant differences, but for the most part things have not changed that much. 10/29/2020 University of Waikato 4

Weka Documentation n You can go to the main Weka page and then click

Weka Documentation n You can go to the main Weka page and then click on documentation and then download the full manual for 3. 8. 3 u https: //www. cs. waikato. ac. nz/ml/weka/documentation. html 10/29/2020 University of Waikato 5

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex {

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 10/29/2020 University of Waikato 6

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex {

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 10/29/2020 University of Waikato 7

10/29/2020 University of Waikato 8

10/29/2020 University of Waikato 8

10/29/2020 University of Waikato 9

10/29/2020 University of Waikato 9

10/29/2020 University of Waikato 10

10/29/2020 University of Waikato 10

Explorer: pre-processing the data n n Data can be imported from a file in

Explorer: pre-processing the data n n Data can be imported from a file in various formats: ARFF, CSV, C 4. 5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: u Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 10/29/2020 University of Waikato 11

10/29/2020 University of Waikato 12

10/29/2020 University of Waikato 12

Reading in the Iris Dataset n The tutorial accesses a copy of the iris

Reading in the Iris Dataset n The tutorial accesses a copy of the iris dataset u The file is probably already on your machine. Most likely it is in a data directory where the program resides, such as C: /Program Files/Weka-3 -8 -4/data. Otherwise search for iris. arff and use that directory; otherwise download it from the Internet: « n https: //storm. cis. fordham. edu/~gweiss/data-mining/weka-data/iris. arff Download it using “Open File” or “Open URL” u There are other datasets in the same directory either on your machine or the URL « 10/29/2020 https: //storm. cis. fordham. edu/~gweiss/data-mining/datasets. html University of Waikato 13

Non-Arff File Types n n By default WEKA expects ARFF format (“. arff”) If

Non-Arff File Types n n By default WEKA expects ARFF format (“. arff”) If you select the “Open File”, you will see you can change to other file types u C 4. 5 (for the old C 4. 5 decision tree learner format) u. csv files « n You can read in data from. csv files For your course projects you may need to use csv u Note that without the arff header you will not get meaningful variable names, but for. csv if you include variable names the tool will use those 10/29/2020 University of Waikato 14

10/29/2020 University of Waikato 15

10/29/2020 University of Waikato 15

10/29/2020 University of Waikato 16

10/29/2020 University of Waikato 16

10/29/2020 University of Waikato 17

10/29/2020 University of Waikato 17

10/29/2020 University of Waikato 18

10/29/2020 University of Waikato 18

10/29/2020 University of Waikato 19

10/29/2020 University of Waikato 19

10/29/2020 University of Waikato 20

10/29/2020 University of Waikato 20

Discretization n The next few slides involve discretizing features. There are major changes between

Discretization n The next few slides involve discretizing features. There are major changes between Weka 3. 6 & 3. 8 In 3. 8 the discretize tool is under supervised attributes u The options are very different and apparently the tool is smarter and hence you do not need to set as many options. With 3. 8 one need not set the number of bins or the type of discretization u 10/29/2020 University of Waikato 21

10/29/2020 University of Waikato 22

10/29/2020 University of Waikato 22

10/29/2020 University of Waikato 23

10/29/2020 University of Waikato 23

10/29/2020 University of Waikato 24

10/29/2020 University of Waikato 24

10/29/2020 University of Waikato 25

10/29/2020 University of Waikato 25

10/29/2020 University of Waikato 26

10/29/2020 University of Waikato 26

10/29/2020 University of Waikato 27

10/29/2020 University of Waikato 27

10/29/2020 University of Waikato 28

10/29/2020 University of Waikato 28

10/29/2020 University of Waikato 29

10/29/2020 University of Waikato 29

10/29/2020 University of Waikato 30

10/29/2020 University of Waikato 30

10/29/2020 University of Waikato 31

10/29/2020 University of Waikato 31

10/29/2020 University of Waikato 32

10/29/2020 University of Waikato 32

10/29/2020 University of Waikato 33

10/29/2020 University of Waikato 33

10/29/2020 University of Waikato 34

10/29/2020 University of Waikato 34

10/29/2020 University of Waikato 35

10/29/2020 University of Waikato 35

Explorer: building “classifiers” n n Classifiers in WEKA are models for predicting nominal or

Explorer: building “classifiers” n n Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: u n Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: u Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … 10/29/2020 University of Waikato 36

Lets start fresh without discretization n Go back to the preprocess tab and reopen

Lets start fresh without discretization n Go back to the preprocess tab and reopen the iris data set and then lets use that. Do it now. 10/29/2020 University of Waikato 37

10/29/2020 University of Waikato 38

10/29/2020 University of Waikato 38

10/29/2020 University of Waikato 39

10/29/2020 University of Waikato 39

10/29/2020 University of Waikato 40

10/29/2020 University of Waikato 40

10/29/2020 University of Waikato 41

10/29/2020 University of Waikato 41

10/29/2020 University of Waikato 42

10/29/2020 University of Waikato 42

10/29/2020 University of Waikato 43

10/29/2020 University of Waikato 43

10/29/2020 University of Waikato 44

10/29/2020 University of Waikato 44

10/29/2020 University of Waikato 45

10/29/2020 University of Waikato 45

10/29/2020 University of Waikato 46

10/29/2020 University of Waikato 46

10/29/2020 University of Waikato 47

10/29/2020 University of Waikato 47

10/29/2020 University of Waikato 48

10/29/2020 University of Waikato 48

10/29/2020 University of Waikato 49

10/29/2020 University of Waikato 49

10/29/2020 University of Waikato 50

10/29/2020 University of Waikato 50

10/29/2020 University of Waikato 51

10/29/2020 University of Waikato 51

10/29/2020 University of Waikato 52

10/29/2020 University of Waikato 52

10/29/2020 University of Waikato 53

10/29/2020 University of Waikato 53

10/29/2020 University of Waikato 54

10/29/2020 University of Waikato 54

10/29/2020 University of Waikato 55

10/29/2020 University of Waikato 55

Right click on this 10/29/2020 University of Waikato 56

Right click on this 10/29/2020 University of Waikato 56

10/29/2020 University of Waikato 57

10/29/2020 University of Waikato 57

10/29/2020 University of Waikato 58

10/29/2020 University of Waikato 58

10/29/2020 University of Waikato 59

10/29/2020 University of Waikato 59

Visualizing Errors n In the next slide the x-axis is the petallength and the

Visualizing Errors n In the next slide the x-axis is the petallength and the y-axis is the petalwidth and the class is shown by colors and the errors by boxes u In my run of Weka 3. 8, it looked very different but that was because the x and y axes were set to the different things. If this happens manually select the x and y values to petallength and width 10/29/2020 University of Waikato 60

10/29/2020 University of Waikato 61

10/29/2020 University of Waikato 61

10/29/2020 University of Waikato 62

10/29/2020 University of Waikato 62

10/29/2020 University of Waikato 63

10/29/2020 University of Waikato 63

10/29/2020 University of Waikato 64

10/29/2020 University of Waikato 64

10/29/2020 University of Waikato 65

10/29/2020 University of Waikato 65

10/29/2020 University of Waikato 66

10/29/2020 University of Waikato 66

10/29/2020 University of Waikato 67

10/29/2020 University of Waikato 67

10/29/2020 University of Waikato 68

10/29/2020 University of Waikato 68

ROC curves n n Weka provides the ability to generate ROC curves In this

ROC curves n n Weka provides the ability to generate ROC curves In this case, right click on the model you just built and instead of selecting “Visualize Classifier Errors” select “Visualize Threshold Curve” u You will need to specify one of the 3 class values since ROC curves are only defined for 2 classes « The one you select will be the positive class and the other two will be merged into the negative class u In this case the curves are “perfect” or near perfect since there is only one error. 10/29/2020 University of Waikato 69

10/29/2020 University of Waikato 70

10/29/2020 University of Waikato 70

10/29/2020 University of Waikato 71

10/29/2020 University of Waikato 71

10/29/2020 University of Waikato 72

10/29/2020 University of Waikato 72

10/29/2020 University of Waikato 73

10/29/2020 University of Waikato 73

10/29/2020 University of Waikato 74

10/29/2020 University of Waikato 74

Now Try a Few More on Your Own n n Run Random Forest u

Now Try a Few More on Your Own n n Run Random Forest u It is a tree ensemble method that is under Trees u After running it with default options, change the maximum depth from 0 (unlimited) to 3, 2, and then 1. When do you see markedly different results? Run Bagging u It is an ensemble (meta learning) method, so find it under classifiers meta Bagging runs a base method again and again with different training data « Run with default options, then change base method to J 48 « 10/29/2020 University of Waikato 75

Now Try the Knowledge. Flow Interface n n n Will build a flow to

Now Try the Knowledge. Flow Interface n n n Will build a flow to do crossvalidated J 48 This example is from the WEKA manual for 3. 8. 3 under the Knowledge. Flow section and then is the first example From the console choose Knowlege. Flow interface 10/29/2020 University of Waikato 76

Add a Data. Sources Node n Expand the Data. Sources entry in the Design

Add a Data. Sources Node n Expand the Data. Sources entry in the Design panel and click on Ar�Loader (the mouse pointer will change to a cross hairs). u u Next place the Ar�Loader step on the layout area by clicking somewhere on the layout (a copy of Ar�Loader icon will appear). Next specify an ARFF file to load by first right clicking the mouse over the Ar�Loader icon on the layout. A pop-up menu will appear. Select Configure under Edit in the list from this menu and browse to the location of your ARFF file. « 10/29/2020 May need to copy an arff file to the PC unless there are data files already loaded under the data folder University of Waikato 77

Adding Class. Assigner to specify class n n n Next expand the “Evaluation” entry

Adding Class. Assigner to specify class n n n Next expand the “Evaluation” entry in the Design panel and choose the Class. Assigner (allows you to choose which column to be the class) step from the toolbar. Place this on the layout. Now connect the Ar�Loader to the Class. Assigner: first right click over the Ar�Loader and select data. Set under Connections in the menu. A rubber band line will appear. Move the mouse over the Class. Assigner step and left click - a red line labeled data. Set will connect the two steps. Next right click over the Class. Assigner and choose Configure from the menu. This will pop up a window from which you can specify which column is the class in your data (last is the default). 10/29/2020 University of Waikato 78

Add Cross. Validation. Fold. Maker n Next grab a Cross. Validation. Fold. Maker step

Add Cross. Validation. Fold. Maker n Next grab a Cross. Validation. Fold. Maker step from the Evaluation entry in the Design panel and place it on the layout. Connect the Class. Assigner to the Cross. Validation. Fold. Maker by right clicking over Class. Assigner and selecting data. Set from under Connections in the menu. 10/29/2020 University of Waikato 79

Add J 48 n n Next expand the Classifiers entry and then the trees

Add J 48 n n Next expand the Classifiers entry and then the trees sub-entry in the Design panel and choose the J 48 step. Place a J 48 step on the layout. • Connect the Cross. Validation. Fold. Maker to J 48 TWICE by first choosing training. Set and then test. Set from the pop-up menu for the Cross. Validation. Fold. Maker. 10/29/2020 University of Waikato 80

Finish the Flow n n Next go back to the Evaluation entry and place

Finish the Flow n n Next go back to the Evaluation entry and place a Classifier. Performance. Evaluator step on the layout. Connect J 48 to this step by selecting the batch. Classifier entry from the pop-up menu for J 48. Next go to the Visualization entry and place a Text. Viewer step on the layout. Connect the Classifier. Performance. Evaluator to the Text. Viewer by selecting the text entry from the pop-up menu for Classifier. Performance. Evaluator. Now start the flow executing by pressing the play button on the toolbar at the top of the window. Progress information for each step in the flow will appear in the Status area and Log at the bottom of the window. When finished you can view the results by choosing Show results from the pop-up menu for the Text. Viewer step. 10/29/2020 University of Waikato 81

Seeing the Results for Each Fold n Connect a Text. Viewer and/or a Graph.

Seeing the Results for Each Fold n Connect a Text. Viewer and/or a Graph. Viewer to J 48 in order to view the textual or graphical representations of the trees produced for each fold of the cross validation (this is something that is not possible in the Explorer). 10/29/2020 University of Waikato 82