Machine Learning Applications Solving Problems using Weka Tool

Machine Learning- Applications & Solving Problems using Weka Tool Prof. (Mr. ) Rahul B. Diwate Assistant Professor, Department of Information Technology, Vishwakarma Institute of Technology. Savitribai Phule Pune University, Pune Web Profile : www. rahuldiwate. com RBDiwate Youtube Channel :

W 5 HH • • What is Machine Learning ? Why is Machine Learning Important ? Where do we use Machine Learning ? When do we use Machine Learning ? Which Learning Method is used in where? How Does Machine Learning Work However So ?

Tools for Machine Learning • • Weka R Programming Python Programming Java Programming

What is Machine Learning ? • Machine Learning is a discipline for artificial intelligence for building computer programs that automatically improve through experience and make predictions.

Evolution of Machine Learning • The term Machine learning was given by Arthur Samuel in the year 1959 in the computer gaming and artificial intelligence field. • Later in the year 1997, Tom Mitchell gave it a standard definition as “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with the experience E. ”

AI vs ML vs DL Exam & Answer Delivery Students & Learning Methods Teachers & Teaching Methods

Artificial Intelligence vs Machine Learning What is Artificial Intelligence • Artificial Intelligence refers to intelligence displayed by machines that simulate human and animal intelligence. What is Machine Learning? • The capability of Artificial Intelligence systems to learn by extracting patterns from data is known as Machine Learning. OR • Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

Traditional vs. Machine Learning Traditional Programming Computation Data Program Computation/ Execution Results Machine Learning Approach Data Program Computation/ Execution Program

Machine Learning Techniques Supervised Learning Unsupervised Learning Reinforcement Learning Fraud Detection Features Extraction Real-Time Decision Image Classification Structure Discovery Gene AI Customer Retention Meaningful Comparison Learning Task Diagnostics Big Data Visualization Skill Acquisition Forecasting Recommended System Robot Navigation Predictions Customer Segmentation Many More… Process Optimization Targeted Marketing Many More…

Machine Learning Algorithm • Machine Learning can learn from labeled data (known as supervised learning) or unlabelled data (known as unsupervised learning). • Machine Learning algorithms involving unlabelled data, or unsupervised learning, are more complicated than those with the labeled data or supervised learning

Supervised Learning • Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. • Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values • The system is able to provide targets for any new input after sufficient training. • The learning algorithm can also compare its output with the correct, intended output and find errors in order to modify the model accordingly.

Supervised Algorithms • • Naïve Bayes Classifiers Decision Trees Support Vector Machine Many More

Classification: Digit Recognition Input(X i ): Image. Features Output(Y): Clas Labels{y 0 , y 1 , . y 9 } Features(X i ) : Proportion of pixels in E a c h o f t h e 1 2 c e l l s X i where i=1, 2, . . . , 12

Speech Recognition When humans are unable to explain their expertise.

Unsupervised Learning • Unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. • Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. • The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data.

Unsupervised Algorithms • • • Hierarchical Clustering: K-means Clustering K- Nearest neighbors Principal Components Analysis Association

Good Dog OR Bad Dog ?

Good Robot OR Bad Robot ? ?

Supervised Vs Unsupervised Parameters Supervised machine learning Unsupervised technique learning technique machine Input Data Algorithms are trained using Algorithms are used against data labeled data. which is not labelled Computational Complexity Supervised learning is a simpler Unsupervised learning method. computationally complex Accuracy Highly accurate and trustworthy Less accurate and trustworthy method. is

Reinforcement Learning • Reinforcement machine learning algorithms is a learning method that interacts with its environment by producing actions and discovers errors or rewards. • Trial and error search and delayed reward are the most relevant characteristics of reinforcement learning. • This method allows machines and software agents to automatically determine the ideal behavior within a specific context in order to maximize its performance. • Simple reward feedback is required for the agent to learn which action is best; this is known as the reinforcement signal.

Fire Place Condition

Where do we use Machine Learning ? • • • SPAM MAIL NEWS AUTONOMOUS VEHICLE GAMES So…. On…

Spam Mail

News

Autonomous Ground Vehicle

Games

Applications of Machine Learning Applications Uses Image Processing • Image tagging and recognition • Self-driving cars • Optical Character Recognition (OCR) Robotics • Human simulation • Industrial robotics Data Mining • Anomaly detection • Grouping and Predictions • Association rules Video Games • Some games implement reinforcement learning Text Analysis • Sentiment Analysis • Spam Filtering • Information Extraction Healthcare • Healthcare Startups

Weka Tool • Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code. • Weka is tried and tested open source machine learning software that can be accessed through a graphical user interface, standard terminal applications, or a Java API. • It is widely used for teaching, research, and industrial applications, contains a plethora of built-in tools for standard machine learning tasks, and additionally gives transparent access to well-known toolboxes such as scikit-learn, R, and Deep learning.

Download Tool & Datasets • Download Weka • Weka 3 - Data Mining with Open Source Machine Learning Software in Java (waikato. ac. nz) • • • Download Datasets http: //rahuldiwate. com/Download/WEKA_DS. rar Auto-WEKA : Sample Datasets (ubc. ca) Weka - Browse /datasets at Source. Forge. net WEKA Datasets, Classifier And J 48 Algorithm For Decision Tree (softwaretestinghelp. com)

WEKA supports a large number of file formats for the data. • • • • arff. gz bsi csv data json. gz libsvm m names xrff. gz

Preprocessing the Data • Open file. . . option under the Preprocess tag select the weather-nominal. arff file. • • • Current Relatior Attribute Selected Attribute Visualize All Remove Attribute

Apply Filters • Some of the machine learning techniques such as association rule • • mining requires categorical data. To illustrate the use of filters, we will use weather. nominal. arff database that contains two numeric attributes temperature and windy. Apply Filters (weka→filters→supervised→attribute→Discretize) Suppose you want to select the best attributes for deciding the play. Select and apply the following filter − weka→filters→supervised→attribute→Attribute. Selection You will notice that it removes the temperature and windy attributes from the database.

Setting Test Data • • • Training set Supplied test set Cross-validation Percentage split Unless you have your own training set or a client supplied test set, you would use cross-validation or percentage split options. • Under cross-validation, you can set the number of folds in which entire data would be split and used during each iteration of training. • In the percentage split, you will split the data between training and testing using the set split percentage

Supervised Algorithms (Classifier) • • Logistic Regression (Classify -> Classifiers->function->select Logistic) Support Vector Machines (Classify -> Classifiers->function->select SMO) Naive Bayes (Classify -> Classifiers->bayes->select Naïve Bayes) Decision Tree (Classify -> Classifiers->trees->select REPTree) (Classify -> Classifiers->trees->select J 48) (You can review a visualization of a decision tree prepared on the entire training data set by right clicking on the “Result list” and clicking “Visualize Tree” and “Visualize classifiers Error”. ) • k-Nearest Neighbors (Classify -> Classifiers->Lazy->select IBk)

Unsupervised Algorithms (Cluster) • Hierarchical Clustering(Cluster-> Clusterers->function->select Hierarchicalclustering) • K-means Clustering (Cluster-> Clusterers->function->select Simple. Kmeans) • EM(Iris)(Cluster-> Clusterers->function->select EM) • (To visualize the clusters, right click on the EM result in the Result list and select Visualize Cluster Assignments) • Association (supermarket DB) (Associate-> Associations->select Apriori / FPGrowth)

Feature Selection • When a database contains a large number of attributes, there will be several attributes which do not become significant in the analysis that you are currently seeking. • Thus, removing the unwanted attributes from the dataset becomes an important task in developing a good machine learning model.

Features Extraction • • Click on the Select attributes TAB. Start Right Click on Result to View Visual Data. Clicking on any of the squares will give you the data plot for your further analysis.

Simple CLI • You have seen so far the power of WEKA in quickly developing machine learning models. What we used is a graphical tool called Explorer for developing these models. WEKA also provides a command line interface that gives you more power than provided in the explorer. • Clicking the Simple CLI button in the GUI Chooser application starts this command line interface. • https: //www. cs. waikato. ac. nz/ml/weka/documentation. html

Experimenter Open the Weka GUI Chooser. Experimenter is divided into 3 steps • Design The Experiment • Run The Experiment. • Review Experiment Results

Step 1 - Design The Experiment • Open the Weka GUI Chooser. • Click the “Experimenter” button to open the Weka Experimenter interface. • On the “Setup” tab, click the “New” button to start a new experiment. • In the “Dataset” pane, click the “Add new…” button and choose data/diabetes. arff. (any Dataset) • In the “Algorithms” pane, click the “Add new…” button, click the “Choose” button and select the “Logistic” algorithm under the “functions” group. Click the “OK” button to add it. • Naive. Bayes under the “bayes” group. • REPTree under the “trees” group. • IBk under the “lazy” group. • SMO under the “functions” group • Save this experiment definition by clicking the “Save” button at the top of the “Setup” panel.

Step 2 - Run The Experiment • 1. Click the “Run” tab. • There few options here. All you can do is start an experiment or stop a running experiment. • 2. Click the “start” button and run the experiment. It should complete in a few seconds. This is because the dataset is small.

Step 3 -Review Experiment Results • You can load results from: • • A file, if you configured your experiment to save results to a file on the “Setup” tab. A Database, if you configured your experiment to save results to a database on the “Setup” tab. A Experiment, if you just ran an experiment in the Experiment Environment (which we just did). Load the results from the experiment we just executed by clicking the “Experiment” button in the “Source” pane. • You will see that 500 results were loaded. This is because we had 5 algorithms that were each evaluated 100 times, 10 -fold cross validation multiplied by 10 repeats.

Best Performance & Rank of Algorithm? ? ? • Which algorithm evaluated in the experiment had the best performance? • // This is useful to know if we wanted to create a good performance model immediately. • What is the rank of algorithms by performance? • //This is useful to know if we want to further investigate and tune the 2 -to-3 algorithms that performed the best on the problem

Debugging Errors With Weka Experiments • For Example • The log in the Run tab will report “there was 1 error” and no more information. • You can easily find out what went wrong by reviewing the Weka log. • In the Weka GUI Chooser, click the “Program” menu and “Log. Menu”.

Weka with Java (Eclipse) • Make sure you’ve downloaded Weka • 2) Create a new project in Eclipse. Find Java Build Path -> Libraries either during project creation or afterwards under “Package Explorer” -> RClick project -> Properties. • 3) “Add External Jars…” and select the weka. jar from your download. • 4) Create a class file under the “src” folder. This code is taken pretty much line for line from weka. wikispaces.

www. rahuldiwate. com RBDIWATE diwate. rahul@gmail. com "LIFE and TIME are the world’s best Teachers. Life teaches us to make good use of TIME and TIME teaches us the value of LIFE. - Dr. A. P. J. Abdul Kalam. " Stay Home Stay Safe