Running Clustering Algorithm in Weka Presented by Rachsuda
Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston 1
What is Weka? • Data mining software in Java – Supervised learning (classification) – Unsupervised learning (clustering) • Tools – Exploration – Visualization – Experiment – Statistical summary 2
Download Weka • http//: www. cs. waikato. ac. nz/ml/weka/ – Window) weka-3 -5 -6 jre. exe( – Linux 3
Getting Start 4
Memory Limitation in Weka • Run Chooser from DOS to increase memory • C: > java -Xmx 128 m -classpath. ; /progra~1/weka-3 -5/weka. jar weka. gui. GUIChooser 5
Weka GUI 6
Explorer 7
Open Files (. csv, . arff) 8
Dataset’s Description Dataset’s statistics Attributes 9
Remove Class Attribute Non-class attributes 10
Select A Clustering Algorithm 11
Select A Clustering Algorithm 12
Select A Clustering Algorithm 13
Parameters’ Setting 14
Run A Clustering Algorithm 15
DBSCAN Results === Run information === Scheme: weka. clusterers. DBScan -E 0. 9 -M 6 -I weka. clusterers. for. OPTICSAnd. DBScan. Databases. Sequential. Database -D weka. clusterers. for. OPTICSAnd. DBScan. Data. Objects. Euclidian. Data. Object Relation: iris-weka. filters. unsupervised. attribute. Remove-R 5 Instances: 150 Attributes: 4 sepallength sepalwidth petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ============================================ Clustered Data. Objects: 150 Number of attributes: 4 Epsilon: 0. 9; min. Points: 6 Index: weka. clusterers. for. OPTICSAnd. DBScan. Databases. Sequential. Database Distance-type: weka. clusterers. for. OPTICSAnd. DBScan. Data. Objects. Euclidian. Data. Object Number of generated clusters: 1 Elapsed time: . 06 ( 0. ) 5. 1, 3. 5, 1. 4, 0. 2 ( 1. ) 4. 9, 3, 1. 4, 0. 2 ( 2. ) 4. 7, 3. 2, 1. 3, 0. 2 ( 3. ) 4. 6, 3. 1, 1. 5, 0. 2 ( 4. ) 5, 3. 6, 1. 4, 0. 2 … (146. ) 6. 3, 2. 5, 5, 1. 9 (147. ) 6. 5, 3, 5. 2, 2 (148. ) 6. 2, 3. 4, 5. 4, 2. 3 (149. ) 5. 9, 3, 5. 1, 1. 8 --> 0 --> 0 --> 0 Clustered Instances 0 150 (100%) 16
Simplify A Tested Dataset 17
Simplify A Tested Dataset 18
Parameters’ Setting 19
DBSCAN Clustering Results === Run information === Scheme: weka. clusterers. DBScan -E 0. 3 -M 50 -I weka. clusterers. for. OPTICSAnd. DBScan. Databases. Sequential. Database -D weka. clusterers. for. OPTICSAnd. DBScan. Data. Objects. Euclidian. Data. Object Relation: iris-weka. filters. unsupervised. attribute. Remove-R 1 -2, 5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ============================================ Clustered Data. Objects: 150 Number of attributes: 2 Epsilon: 0. 3; min. Points: 50 Index: weka. clusterers. for. OPTICSAnd. DBScan. Databases. Sequential. Database Distance-type: weka. clusterers. for. OPTICSAnd. DBScan. Data. Objects. Euclidian. Data. Object Number of generated clusters: 2 Elapsed time: . 03 ( 0. ) 1. 4, 0. 2 ( 1. ) 1. 4, 0. 2 ( 2. ) 1. 3, 0. 2 ( 3. ) 1. 5, 0. 2 … (146. ) 5, 1. 9 (147. ) 5. 2, 2 (148. ) 5. 4, 2. 3 (149. ) 5. 1, 1. 8 --> --> 0 0 --> 1 Clustered Instances 0 1 50 ( 33%) 100 ( 67%) 20
Run k-Means in Weka 21
Parameters’ Setting 22
k-Means Clustering Results === Run information === Scheme: weka. clusterers. Simple. KMeans -N 2 -S 10 Relation: iris-weka. filters. unsupervised. attribute. Remove-R 1 -2, 5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === k. Means ====== Number of iterations: 6 Within cluster sum of squared errors: 5. 179687509974782 Cluster centroids: Cluster 0 Mean/Mode: 4. 906 1. 676 Std Devs: 0. 8256 0. 4248 Cluster 1 Mean/Mode: 1. 464 0. 244 Std Devs: 0. 1735 0. 1072 Clustered Instances 0 1 100 ( 67%) 50 ( 33%) 23
Arff. Viewer: Convert Dataset’s Extension 24
Open A Dataset’s file 25
Select A Dataset’s File 26
View the Dataset 27
Manipulate the Dataset (Optional) 28
Save As. Arff File 29
Weka Documentation 30
- Slides: 30