Fundamentals of Deep Learning Lecture 1 Chapter 1

AI, Machine Learning and Deep learning • In the past few years, Artificial Intelligence

AI, Machine Learning and Deep learning Asst. Prof. Dr. Anilkumar K. G 3

Artificial Intelligence • AI was born in the 1950 s, when a handful of

Artificial Intelligence • Symbolic AI handcrafts a sufficiently large set of explicit rules for

Machine Learning • Analytical Engine: the first-known general-purpose, mechanical computer by Charles Babbage designed

Machine Learning – In 1843, Ada Lovelace remarked, “The Analytical Engine has no pretensions

Machine Learning • Machine learning arises from this question: could a computer go beyond

Machine Learning Asst. Prof. Dr. Anilkumar K. G 9

Machine Learning • In classical programming, the paradigm of symbolic AI, humans input rules

Machine Learning • For instance, if you wished to automate the task of tagging

Learning Representations from Data • To define deep learning and understand the difference between

Learning Representations from Data • Input data points: For instance, if the task is

Learning Representations from Data • A machine-learning model transforms its input data into meaningful

Learning Representations from Data • what’s a data representation? – it’s a different way

Learning Representations from Data • Let’s make finding appropriate data representations concrete. Consider an

Learning Representations from Data • in figure 1. 3, you can see, we have

Learning Representations from Data • We need a new representation of our data that

Learning Representations from Data • With the better representation, the black/white classification problem can

Learning Representations from Data • All machine-learning algorithms consist of automatically finding such transformations

Learning Representations from Data • Machine learning is, technically searching for useful representations of

The “Deep” in Deep Learning • Deep learning is a specific subfield of machine

The “Deep” in Deep Learning • Other approaches to machine learning tend to focus

The “Deep” in Deep Learning • Deep-learning models are not models of the brain,

The “Deep” in Deep Learning Asst. Prof. Dr. Anilkumar K. G 25

The “Deep” in Deep Learning • The neural network transforms the digit image into

The “Deep” in Deep Learning Asst. Prof. Dr. Anilkumar K. G 27

How Does Deep Learning Work? • We know that machine learning is about mapping

How Does Deep Learning Work? • The specification of what a layer does to

How Does Deep Learning Work? Asst. Prof. Dr. Anilkumar K. G 30

How Does Deep Learning Work? • In this context, learning means finding a set

How Does Deep Learning Work? • To control the output of a neural network,

How Does Deep Learning Work? Asst. Prof. Dr. Anilkumar K. G 33

How Does Deep Learning Work? • The fundamental trick in deep learning is to

How Does Deep Learning Work? Asst. Prof. Dr. Anilkumar K. G 35

How Does Deep Learning Work? • Initially, the weights of the network are assigned

What Deep Learning Has Achieved? • Although deep learning is a fairly old subfield

A Brief History of Machine Learning • Probabilistic modeling is the application of the

A Brief History of Machine Learning • Support vector machine (SVM): – SVMs aim

Why deep learning? Why now? • The two key ideas of deep learning for

Clustering • In supervised learning, the target features that must be predicted from input

Clustering • The general idea behind clustering is to partition the examples into clusters

Clustering • In hard clustering, each example is placed definitively in a class. –

K-Means Clustering • The K-Means algorithm is used for hard clustering. – This is

K-Means Clustering • Here all we do in K-means clustering is try to split

K-Means Clustering • This is showing an example of K-means clustering with k of

K-Means Clustering Algorithm • Here's the algorithm for K-Means clustering: – Randomly pick K

K-Means Clustering Algorithm: Details • K-Means clustering intends to partition n objects into k

K-Means Clustering Algorithm: Details • K-Means clustering algorithm: 1. Choose a value of k,

K-Means Clustering Algorithm – Example 1 • Apply k-means on the following 1 dimensional

K-Means Clustering Algorithm – Example 1 As we can see in the above table,

K-Means Clustering Algorithm – Example 1 • Iteration 2 Calculate new mean of datapoints

K-Means Clustering Algorithm – Example 1 • Iteration 3 Calculate new mean of datapoints

K-Means Clustering Algorithm – Example 1 • Iteration 4 Calculate new mean of datapoints

K-Means Clustering Algorithm – Example 2 • Suppose we want to group the visitors

K-Means Clustering- Performance • K-Means is relatively an efficient method. • However, we need

Slides: 59

Download presentation

Fundamentals of Deep Learning Lecture 1 (Chapter 1, Deep Learning with Python, Francois Chollet, Manning Publications, ISBN 9781617294433, 2018) Asst. Prof. Dr. Anilkumar K. G 1

AI, Machine Learning and Deep learning • In the past few years, Artificial Intelligence (AI) has been a subject of intense media hype. Machine learning, deep learning, and AI come up in countless articles, often outside of technology-minded publications. • First, we need to define clearly what we’re talking about when we mention AI. • What are artificial intelligence, machine learning, and deep learning (see figure 1. 1)? How do they relate to each other? Asst. Prof. Dr. Anilkumar K. G 2

AI, Machine Learning and Deep learning Asst. Prof. Dr. Anilkumar K. G 3

Artificial Intelligence • AI was born in the 1950 s, when a handful of pioneers from the nascent field of computer science started asking whether computers could be made to “think”. • A concise definition of AI: the effort to automate intellectual tasks normally performed by humans. • AI is a general field that encompasses machine learning and deep learning, but that also includes many more approaches that don’t involve any learning. Asst. Prof. Dr. Anilkumar K. G 4

Artificial Intelligence • Symbolic AI handcrafts a sufficiently large set of explicit rules for manipulating knowledge. • This approach called expert system was the dominant paradigm in AI from the 1950 s to the late 1980 s. – It turned out to be intractable to figure out explicit rules for solving more complex, fuzzy problems, such as image classification, speech recognition, and language translation. • A new approach arose to take symbolic AI’s place: machine learning. Asst. Prof. Dr. Anilkumar K. G 5

Machine Learning • Analytical Engine: the first-known general-purpose, mechanical computer by Charles Babbage designed in the 1830 s and 1840 s. • At that time the concept of general-purpose computation was yet to be invented. • It was merely meant as a way to use mechanical operations to automate certain computations from the field of mathematical analysis—hence, the name Analytical Engine. Asst. Prof. Dr. Anilkumar K. G 6

Machine Learning – In 1843, Ada Lovelace remarked, “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. … • AI pioneer Alan Turing in his landmark 1950 paper “Computing Machinery and Intelligence, ” which introduced the Turing test as well as key concepts that would come to shape AI. – Turing was quoting Ada Lovelace while pondering whether general-purpose computers could be capable of learning and originality, and he came to the conclusion that they could. Asst. Prof. Dr. Anilkumar K. G 7

Machine Learning • Machine learning arises from this question: could a computer go beyond “what we know how to order it to perform” and learn on its own how to perform a specified task? • Could a computer surprise us? Rather than programmers crafting data-processing rules by hand, could a computer automatically learn these rules by looking at data? • This question opens the door to a new programming paradigm(see figure 1. 2). Asst. Prof. Dr. Anilkumar K. G 8

Machine Learning Asst. Prof. Dr. Anilkumar K. G 9

Machine Learning • In classical programming, the paradigm of symbolic AI, humans input rules (a program) and data to be processed according to these rules, and out come answers (see figure 1. 2). • With machine learning, humans input data as well as the answers expected from the data, and out come the rules. – These rules can then be applied to new data to produce original answers. – A machine-learning system is trained rather than explicitly programmed. Asst. Prof. Dr. Anilkumar K. G 10

Machine Learning • For instance, if you wished to automate the task of tagging your vacation pictures, you could present a machine-learning system with many examples of pictures already tagged by humans, and the system would learn statistical rules for associating specific pictures to specific tags. • Machine learning is tightly related to mathematical statistics, but it differs from statistics in several important ways. Unlike statistics, machine learning tends to deal with large, complex datasets (such as a dataset of millions of images, each consisting of tens of thousands of pixels) for which classical statistical analysis such as Bayesian analysis would be impractical. Asst. Prof. Dr. Anilkumar K. G 11

Learning Representations from Data • To define deep learning and understand the difference between deep learning and machine-learning approaches, first we need some idea of what machine learning algorithms do. • Let us see how a machine learning algorithm discovers rules to execute a data-processing task. So, to do machine learning, we need three things: 1. Input data points 2. Examples of expected output 3. A way to measure the performance of the algorithm Asst. Prof. Dr. Anilkumar K. G 12

Learning Representations from Data • Input data points: For instance, if the task is speech recognition, these data points could be sound files of people speaking. If the task is image tagging, they could be pictures. • Examples of the expected output: In a speech-recognition task, these could be human-generated transcripts of sound files. In an image task, expected outputs could be tags such as “dog, ” “cat, ” and so on. • A way to measure whether the algorithm is doing a good job: This is necessary to determine the distance between the algorithm’s current output and its expected output. – The measurement is used as a feedback signal to adjust the way the algorithm works. This adjustment step is what we call learning. Asst. Prof. Dr. Anilkumar K. G 13

Learning Representations from Data • A machine-learning model transforms its input data into meaningful outputs, a process that is “learned” from exposure to known examples of inputs and outputs. • Therefore, the central problem in machine learning and deep learning is to meaningfully transform data: – in other words, to learn useful representations of the input data that get us closer to the expected output. Asst. Prof. Dr. Anilkumar K. G 14

Learning Representations from Data • what’s a data representation? – it’s a different way to look at data—to represent or encode data. For instance, a color image can be encoded in the RGB (redgreen-blue) format or in the HSV (hue-saturation-value) format : these are two different representations of the same data. – Some tasks that may be difficult with one representation can become easy with another. • For example, the task “select all red pixels in the image” is simpler in the RGB format, whereas “make the image less saturated” is simpler in the HSV format – Machine-learning models are all about finding appropriate representations for their input data. Asst. Prof. Dr. Anilkumar K. G 15

Learning Representations from Data • Let’s make finding appropriate data representations concrete. Consider an x-axis, a y-axis, and some points represented by their coordinates in the (x, y) system, as shown in figure 1. 3. Asst. Prof. Dr. Anilkumar K. G 16

Learning Representations from Data • in figure 1. 3, you can see, we have a few white points and a few black points. Let’s say we want to develop an algorithm that can take the coordinates (x, y) of a point and output whether that point is likely to be black or to be white. In this case: 1. The inputs are the coordinates of our points. 2. The expected outputs are the colors of our points. 3. A way to measure whether our algorithm is doing a good job could be, for instance, the percentage of points that are being correctly classified. Asst. Prof. Dr. Anilkumar K. G 17

Learning Representations from Data • We need a new representation of our data that clearly separates the white points from the black points. One transformation we could use, illustrated in figure 1. 4. Asst. Prof. Dr. Anilkumar K. G 18

Learning Representations from Data • With the better representation, the black/white classification problem can be expressed as a simple rule: “Black points are such that x > 0, ” or “White points are such that x < 0. ” – This new representation basically solves the classification problem. • In this case, we defined the coordinate change by hand. • But if instead we tried systematically searching for different possible coordinate changes, and used as feedback the percentage of points being correctly classified, then we would be doing machine learning. – Learning, in the context of machine learning, describes an automatic search process for better representations. Asst. Prof. Dr. Anilkumar K. G 19

Learning Representations from Data • All machine-learning algorithms consist of automatically finding such transformations that turn data into moreuseful representations for a given task. • These operations can be coordinate changes, or linear projections, translations, nonlinear operations (such as “select all points such that x > 0”), and so on. – Machine-learning algorithms aren’t usually creative in finding these transformations; they’re merely searching through a predefined set of operations, called a hypothesis space. Asst. Prof. Dr. Anilkumar K. G 20

Learning Representations from Data • Machine learning is, technically searching for useful representations of some input data, within a predefined space of possibilities, using guidance from a feedback signal. – This simple idea allows for solving a remarkably broad range of intellectual tasks, from speech recognition to autonomous car driving. Asst. Prof. Dr. Anilkumar K. G 21

The “Deep” in Deep Learning • Deep learning is a specific subfield of machine learning, a new take on learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations. – The deep in deep learning isn’t a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations. – How many layers contribute to a model of the data is called the depth of the model. • Modern deep learning often involves tens or even hundreds of successive layers of representations and they’re all learned automatically from exposure to training data. Asst. Prof. Dr. Anilkumar K. G 22

The “Deep” in Deep Learning • Other approaches to machine learning tend to focus on learning only one or two layers of representations of the data, called shallow learning. • In deep learning, these layered data representations are learned via models called neural networks, structured in literal layers stacked on top of each other. – The term neural network is a reference to neurobiology, but although some of the central concepts in deep learning were developed in part by drawing inspiration from our understanding of the brain, • deep-learning models are not models of the brain. Asst. Prof. Dr. Anilkumar K. G 23

The “Deep” in Deep Learning • Deep-learning models are not models of the brain, and it is just a mathematical framework for learning representations from data. • What do the representations learned by a deep-learning algorithm look like? – Let’s examine how a network several layers deep (see figure 1. 5) transforms an image of a digit in order to recognize what digit it is. Asst. Prof. Dr. Anilkumar K. G 24

The “Deep” in Deep Learning Asst. Prof. Dr. Anilkumar K. G 25

The “Deep” in Deep Learning • The neural network transforms the digit image into representations that are increasingly different from the original image and increasingly informative about the final result shown in figure 1. 6. – You can think of a deep neural network as a multistage information-distillation operation, where information goes through successive filters and comes out increasingly purified. – So deep learning is, technically a multistage way to learn data representations. • It’s a simple idea—but, as it turns out, very simple mechanisms, sufficiently scaled, can end up looking like magic. Asst. Prof. Dr. Anilkumar K. G 26

The “Deep” in Deep Learning Asst. Prof. Dr. Anilkumar K. G 27

How Does Deep Learning Work? • We know that machine learning is about mapping inputs (such as images) to targets (such as the label “cat”), which is done by observing many examples of input and targets. • Deep neural networks do this input-to-target mapping via a deep sequence of simple data transformations (layers) and that these data transformations are learned by exposure to examples. Asst. Prof. Dr. Anilkumar K. G 28

How Does Deep Learning Work? • The specification of what a layer does to its input data is stored in the layer’s weights (which in essence are a bunch of numbers). • In technical terms, we’d say that the transformation implemented by a layer is parameterized by its weights (weights are also sometimes called the parameters of a layer. ), see figure 1. 7. Asst. Prof. Dr. Anilkumar K. G 29

How Does Deep Learning Work? Asst. Prof. Dr. Anilkumar K. G 30

How Does Deep Learning Work? • In this context, learning means finding a set of values for the weights of all layers in a network, such that the network will correctly map example inputs to their associated target outputs. • A deep neural network can contain tens of millions of parameters. – Finding the correct value for all of them may seem like a daunting task, especially given that modifying the value of one parameter will affect the behavior of all the others! Asst. Prof. Dr. Anilkumar K. G 31

How Does Deep Learning Work? • To control the output of a neural network, you need to be able to measure how far this output is from what you expected. • This is the job of the loss function of the network, also called the objective function. – The loss function takes the predictions of the network and the true target (what you expected the network to output) and computes a distance score, capturing how well the network has done on this specific example (see figure 1. 8). Asst. Prof. Dr. Anilkumar K. G 32

How Does Deep Learning Work? Asst. Prof. Dr. Anilkumar K. G 33

How Does Deep Learning Work? • The fundamental trick in deep learning is to use this score as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score for the current example (see figure 1. 9). • This adjustment is the job of the optimizer, which implements what’s called the Backpropagation algorithm: the central algorithm in deep learning. Asst. Prof. Dr. Anilkumar K. G 34

How Does Deep Learning Work? Asst. Prof. Dr. Anilkumar K. G 35

How Does Deep Learning Work? • Initially, the weights of the network are assigned random values, so the network merely implements a series of random transformations. • Naturally, its output is far from what it should ideally be, and the loss score is accordingly very high. • But with every example the network processes, the weights are adjusted a little in the correct direction, and the loss score decreases. • This is the training loop, which, repeated a sufficient number of times yields weight values that minimize the loss function. • A network with a minimal loss is one for which the outputs are as close as they can be to the targets: a trained neural network. Asst. Prof. Dr. Anilkumar K. G 36

What Deep Learning Has Achieved? • Although deep learning is a fairly old subfield of machine learning, it only rose to prominence in the early 2010 s. • Deep learning has achieved the following breakthroughs: – – – Near-human-level image classification Near-human-level speech recognition Near-human-level handwriting transcription Improved machine translation Improved text-to-speech conversion Digital assistants such as Google Now and Amazon Alexa Near-human-level autonomous driving Improved ad targeting, as used by Google, Baidu, and Bing Improved search results on the web Ability to answer natural-language questions Superhuman Go playing Asst. Prof. Dr. Anilkumar K. G 37

A Brief History of Machine Learning • Probabilistic modeling is the application of the principles of statistics to data analysis. It was one of the earliest forms of machine learning, and it’s still widely used to this day. One of the best-known algorithms in this category is the Naive Bayes algorithm. • Kernel methods are a group of classification algorithms, the best known of which is the support vector machine (SVM). The modern formulation of an SVM was developed by Vladimir Vapnik and Corinna Cortes in the early 1990 s at Bell Labs. Asst. Prof. Dr. Anilkumar K. G 38

A Brief History of Machine Learning • Support vector machine (SVM): – SVMs aim at solving classification problems by finding good decision boundaries between two sets of points belonging to two different categories. – A decision boundary can be thought of as a line or surface separating your training data into two spaces corresponding to two categories. – To classify new data points, you just need to check which side of the decision boundary they fall on. • Decision trees are flowchart-like structures that let you classify input data points or predict output values given inputs. They’re easy to visualize and interpret. Asst. Prof. Dr. Anilkumar K. G 39

Why deep learning? Why now? • The two key ideas of deep learning for computer vision —convolutional neural networks (CNN) and backpropagation neural networks. • In general, three technical forces are driving advances in machine learning: 1. Hardware 2. Datasets and benchmarks 3. Algorithmic advances Asst. Prof. Dr. Anilkumar K. G 40

Clustering • In supervised learning, the target features that must be predicted from input features are observed in the training data. • In clustering or unsupervised learning, the target features are not given in the training examples. – The aim is to construct a natural classification that can be used to cluster the data. • An intelligent tutoring system may want to cluster students’ learning behavior so that strategies that work for one member of a class may work for other members. Asst. Prof. Dr. Anilkumar K. G 41

Clustering • The general idea behind clustering is to partition the examples into clusters or classes. – Each class predicts feature values for the examples in the class. – Each clustering has a prediction error on the predictions. – The best clustering is the one that minimizes the error. Asst. Prof. Dr. Anilkumar K. G 42

Clustering • In hard clustering, each example is placed definitively in a class. – The class is then used to predict the feature values of the example. • In soft clustering, in which each example has a probability distribution over its class. – The prediction of the values for the features of an example is the weighted average of the predictions of the classes the example is in, weighted by the probability of the example being in the class. Asst. Prof. Dr. Anilkumar K. G 43

K-Means Clustering • The K-Means algorithm is used for hard clustering. – This is an unsupervised learning technique where you have a collection of stuff that you want to group together into various clusters. – This is a very common technique in machine learning where you just try to take a bunch of data and find interesting clusters of things just based on the attributes of the data itself. – The training examples and the number of classes k, are given as input. – The output is a set of k classes, a prediction of a value for each feature for each class, and an assignment of examples to classes. Asst. Prof. Dr. Anilkumar K. G 44

K-Means Clustering • Here all we do in K-means clustering is try to split our data into k groups - that's where the k comes from, it's how many different groups you're trying to split your data into - and it does this by finding k centroids (means). So, basically, what group a given data point belongs to is defined by which of these centroid points it's closest to in your scatter plot. You can visualize this in the following image: Asst. Prof. Dr. Anilkumar K. G 45

K-Means Clustering • This is showing an example of K-means clustering with k of 3 (k = 3), and the squares represent data points in a scatter plot. – The circles represent the centroids that the k-means clustering algorithm came up with, and each point is assigned a cluster based on which centroid it's closest to. – So that's all there is to it, really. It's an example of unsupervised learning. – Instead of clustering a given dataset based on it’s training data; rather, you're just given the data itself and it tries to converge on these clusters naturally just based on the attributes of the data alone. (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 46

K-Means Clustering Algorithm • Here's the algorithm for K-Means clustering: – Randomly pick K centroids (means): We start off with a randomly chosen set of centroids. So if we have a K of three we're going to look for three clusters in our group. – Assign each data point to the centroid it is closest to: We then assign each data point to the randomly assigned centroid that it is closest to. – Recompute the centroids based on the average position of each centroid's points: Then recompute the centroid for each cluster that we come up with. – Iterate until points stop changing assignment to centroids: We will do it all again until those centroids stop moving, we hit some threshold value, we have converged on something here. – Predict the cluster for new points: To predict the clusters for new points. (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 47

K-Means Clustering Algorithm: Details • K-Means clustering intends to partition n objects into k clusters in which each object belongs to the cluster with the nearest mean. • This method produces exactly k different clusters of greatest possible distinction. • The best number of clusters k leading to the greatest separation (distance) is not known as a prior and must be computed from the data. • The objective of K-Means clustering is to minimize total intra-cluster variance, or, the squared error function: Where xi is a data point in dataset i, Cj is the cluster mean(the center of cluster j) Asst. Prof. Dr. Anilkumar K. G 48

K-Means Clustering Algorithm: Details • K-Means clustering algorithm: 1. Choose a value of k, number of clusters to be formed. 2. Randomly select k data points from the data set as the initial cluster centeroids/centers. 3. For each datapoint: 3. 1. Compute the distance (Euclidean distance) between the datapoint and the cluster centroid 3. 2. Assign the datapoint to the closest centroid. 4. For each cluster calculate the new mean based on the datapoints in the cluster. 5. Repeat 3 & 4 steps until mean of the clusters stops changing or maximum number of iterations reached. Asst. Prof. Dr. Anilkumar K. G 49

K-Means Clustering Algorithm – Example 1 • Apply k-means on the following 1 dimensional dataset for k = 2 (for two clusters) and dataset = {2, 4, 10, 12, 3, 20, 30, 11, 25} • Iteration 1 Let M 1, and M 2 are the two randomly selected centroids (means) where M 1= 4, M 2=11 and the initial clusters are C 1= {4}, C 2= {11}. Calculate the Euclidean distance as: Distance D = [x, a]=√(x-a)² D 1 is the distance from M 1 and D 2 is the distance from M 1 (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 50

K-Means Clustering Algorithm – Example 1 As we can see in the above table, 2 datapoints are added to cluster C 1 and other datapoints are added to cluster C 2: C 1 = {2, 4, 3} and C 2 = {10, 12, 20, 30, 11, 25} Datapoint D 1 D 2 Cluster 2 2 9 C 1 4 0 7 C 1 10 6 1 C 2 12 8 1 C 2 3 1 8 C 1 20 16 9 C 2 30 26 19 C 2 11 7 0 C 2 25 21 14 C 2 (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 51

K-Means Clustering Algorithm – Example 1 • Iteration 2 Calculate new mean of datapoints in C 1 and C 2: M 1 = (2 + 4 + 3)/3 = 3 and M 2 = (10 + 12 + 20 + 30 + 11 + 25)/6 = 18 Calculating distance D 1 and D 2: Calculated distances (D 1 and D 2) and updated clusters are shown in the Table below: (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 52

K-Means Clustering Algorithm – Example 1 As we can see in the above table, there are 4 datapoints are added in cluster C 1 and other datapoints are in cluster C 2: C 1 = {2, 4, 3, 10} and C 2 = {12, 20, 30, 11, 25} Datapoint D 1 D 2 Cluster 2 1 16 C 1 4 1 14 C 1 3 0 15 C 1 10 7 8 C 1 12 9 6 C 2 20 17 2 C 2 30 27 12 C 2 11 8 7 C 2 25 22 7 C 2 (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 53

K-Means Clustering Algorithm – Example 1 • Iteration 3 Calculate new mean of datapoints in C 1 and C 2: M 1 = (2 + 4 + 3 + 10)/4 = 4. 75 and M 2 = (12 + 20 + 30 + 11 + 25)/5 = 19. 6 Calculating distance D 1 and D 2: Calculated distances (D 1 and D 2) and updated clusters are shown in the Table below: (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 54

K-Means Clustering Algorithm – Example 1 As we can see in the above table, there are 6 datapoints are added in cluster C 1 and other datapoints are in cluster C 2: C 1 = {2, 4, 3, 10, 12, 11} and C 2 = {20, 30, 25} Datapoint D 1 D 2 Cluster 2 2. 75 17. 6 C 1 4 0. 75 15. 6 C 1 3 1. 75 16. 6 C 1 10 5. 25 9. 6 C 1 12 7. 25 7. 6 C 1 20 15. 25 0. 4 C 2 30 25. 25 10. 4 C 2 11 6. 25 8. 6 C 1 25 20. 25 5. 4 C 2 (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 55

K-Means Clustering Algorithm – Example 1 • Iteration 4 Calculate new mean of datapoints in C 1 and C 2: M 1 = (2 + 4 + 3 + 10 +12 +11)/6 = 7 and M 2 = (20 + 30 + 25)/3 = 25 Calculating distance D 1 and D 2: Calculated distances (D 1 and D 2) and updated clusters are shown in the Table below: (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 56

K-Means Clustering Algorithm – Example 1 As we can see in the above table, there are 6 datapoints are added in cluster C 1 and other datapoints are in cluster C 2: C 1 = {2, 4, 3, 10, 12, 11} and C 2 = {20, 30, 25} • There are same datapoints in both clusters after the iterations 3 and 4. This becomes the stopping criterion of this algorithm. Datapoint D 1 D 2 Cluster 2 5 23 C 1 4 3 21 C 1 3 4 22 C 1 10 3 15 C 1 12 5 13 C 1 11 4 14 C 1 20 13 5 C 2 30 23 5 C 2 25 18 0 C 2 (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 57

K-Means Clustering Algorithm – Example 2 • Suppose we want to group the visitors to a website using just their age (one-dimensional space) as follows (k = 2): – Ages: 5, 16, 19, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65 • Initial clusters are (randomly selected centroids) are M 1 = 16, and M 2 = 22 (means that the initial clusters are C 1 = {16} and C 2 = {22}). • Based on the above parameters, show the iterations which are used by the k-means clustering algorithm. (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 58

K-Means Clustering- Performance • K-Means is relatively an efficient method. • However, we need to specify the number of clusters, in advance and the final results are sensitive to initialization and often terminates at a local optimum. – Unfortunately there is no global theoretical method to find the optimal number of clusters. • A practical approach is to compare the outcomes of multiple runs with different k and choose the best one based on a predefined criterion. – In general, a large k probably decreases the error but increases the risk of overfitting. (https: //blog. floydhub. com/introduction-to-k-means-clustering-in-python-with-scikit-learn) Asst. Prof. Dr. Anilkumar K. G 59