W E K A Waikato Environment for Knowledge
- Slides: 20
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef Stefan November 2005
Goals • Aquisition of functional knowledge about the WEKA platform • Ability of processing (own) data in WEKA identify a problem transform into data apply to data choose appropriate DM technique evaluate results interpretation
What is WEKA ? Some basic facts about WEKA: • WEKA(1) = a flightless bird with an inquisitive nature (found only on the islands of New Zealand) • WEKA(2) = a software ‘workbench’ incorporating several standard ML/DM techniques • Authors = Ian H. Witten, Eibe Frank (et. al. ) • Programming language = JAVA • Origin = The University of Waikato, New Zealand • Literature = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999 • Homepage = http: //www. cs. waikato. ac. nz/~ml/weka
Objectives of WEKA • make ML/DM techniques generally available • apply them to practical problems (in agriculture) • develop new ML/DM algorithms • contribute to theoretical framework of the field (ML/DM)
Versions of WEKA • There are several versions of WEKA: – WEKA 3. 0: “book version” compatible with description in data mining book – WEKA 3. 2: “first GUI version” adds graphical user interfaces (book version is command-line only) – WEKA 3. 5: “development version” with lots of improvements • This workshop is based on WEKA 3. 5(. 2)
Outline • WEKA on the WEB • Transforming data into the “right” format • Using the “Explorer” • WEKA from the command-line (Simple CLI) • Knowledge flow in brief • Performing the experiments • Tips & tricks • The PRO’s and the CON’s of WEKA
WEKA on the WEB
The input to WEKA ARFF (Attribute-Relation • File Format) example: Play-tennis domain format - “flat” files: %this is an example of a knowledge %domain in ARFF format @relation weather @attribute @attribute outlook {sunny, overcast, rainy} temperature real humidity real windy {TRUE, FALSE} play {yes, no} @data sunny, 85, FALSE, no sunny, 80, 90, TRUE, no overcast, 83, 86, FALSE, yes rainy, 70, 96, FALSE, yes rainy, 68, 80, FALSE, yes rainy, 65, 70, TRUE, no overcast, 64, 65, TRUE, yes sunny, 72, 95, FALSE, no sunny, 69, 70, FALSE, yes rainy, 75, 80, FALSE, yes sunny, 75, 70, TRUE, yes overcast, 72, 90, TRUE, yes overcast, 81, 75, FALSE, yes. . . Conversion to the ARFF format ? Example: • converting from MS-EXCEL to ARFF
Starting WEKA – the GUI
A quick tour of the “explorer” • Preprocess panel Filters panel Domain info. panel Attributes panel Status bar Attribute visualization panel Log file
A quick tour of the “explorer” • Classify panel Classifier panel Test options panel Class attribute Result panel Output panel
A quick tour of the “explorer” • Visualize panel
The command line • example: C: Temp>java weka. classifiers. trees. J 48 Weka exception: No training file and no object input file given. General options: -t <name of training file> Sets training file. -T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data. -c <class index> Sets index of class attribute (default: last). -x <number of folds> Sets number of folds for cross-validation (default: 10). -s <random number seed> Sets random number seed for cross-validation (default: 1). -m <name of file with cost matrix> Sets file with cost matrix. -l <name of input file> Sets model input file. -d <name of output file> Sets model output file. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -p Only outputs predictions for test instances. -r Only outputs cumulative margin distribution. -z <class name> Only outputs the source representation of the classifier, giving it the supplied name. -g Only outputs the graph representation of the classifier. Options specific to weka. classifiers. j 48. J 48: -U Use unpruned tree. -C <pruning confidence> Set confidence threshold for pruning. (default 0. 25) -M <minimum number of instances> Set minimum number of instances per leaf. (default 2) -R Use reduced error pruning. -N <number of folds> Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3) -B Use binary splits only. -S Don't perform subtree raising. -L Do not clean up after the tree has been built.
Using the “Simple CLI”
The “flow of knowledge”
Performing the experiments
Tips & tricks • More memory: java -mx 10000 -oss 10000. . . • Converting to ARFF & verify: java weka. core. converters. CSVLoader filename. csv > filename. arff java weka. core. Instances filename. arff • Checking available memory: – rigth-clich on the status bar
GUI vs. command line GUI (+): Command line (-): • visualisation of data and (some) models • only textual visualisation of models • awkward to use GUI (-): Command line (+): • not all the parameters can be set (reduced functionality) • full functionality • batch processing
PROs & CONs of WEKA PROs: CONs: • open source (GNU licence) • relatively slow (JAVA) • platform-independent • ‘incomplete’ documentation (JAVA) • easy to use • (relatively) easy to modify (some GUI features could be explained better) • some features available only from command line
That’s it !!! Thanks
- Waikato dhb ransomware
- Waikato stormwater management guideline
- Waikato kindergarten association
- Landfowl
- Financial environment in business environment
- Assessment centered learning environment
- Contoh shallow knowledge dan deep knowledge
- "the knowledge society" "the knowledge society" or tks
- Knowledge shared is knowledge squared meaning
- Priori vs posteriori
- Knowledge shared is knowledge multiplied interpretation
- Street smart vs book smart
- Knowledge creation and knowledge architecture
- Knowledge and knower
- Shared and personal knowledge
- Bris för vuxna
- Mat för idrottare
- Smärtskolan kunskap för livet
- Trög för kemist
- Teckenspråk minoritetsspråk argument
- Ledarskapsteorier