Machine Learning Lecture 1 Contents Why machine learning

  • Slides: 28
Download presentation
Machine Learning Lecture # 1

Machine Learning Lecture # 1

Contents �Why machine learning (ML) useful ? �What is ML ? �Key steps of

Contents �Why machine learning (ML) useful ? �What is ML ? �Key steps of learning �Types of ML algorithms

Why Machine learning �Computational power is available (Resource) �Recent progress in algorithms and theory

Why Machine learning �Computational power is available (Resource) �Recent progress in algorithms and theory (Resource) �Growing flood of online data (Requirement) �Three niches of ML �Data Mining: using historical data to improve decisions, e. g. Medical record – medical knowledge �Software applications we can’t program by hand, e. g. Speech recognition, handwritten recognition, autonomous driving �Self customizing programs, e. g. Amazon or Newsreaders that learn user interest

Typical Data Mining Task

Typical Data Mining Task

Typical Data Mining Task

Typical Data Mining Task

Typical Data Mining Task

Typical Data Mining Task

Credit Risk Analysis

Credit Risk Analysis

Problems Too Difficult to Program by Hand �Speech recognition �Face recognition �Robotics control

Problems Too Difficult to Program by Hand �Speech recognition �Face recognition �Robotics control

Problems Too Difficult to Program by Hand �It is very hard to write programs

Problems Too Difficult to Program by Hand �It is very hard to write programs that solve problems like recognizing a face. �We don’t know what program to write because we don’t know how our brain does it. �Even if we had a good idea about how to do it, the program might be horrendously complicated. �Instead of writing a program by hand, we collect lots of examples that specify the correct output for a given input. �A machine learning algorithm then takes these examples and produces a program that does the job. �The program produced by the learning algorithm may look very different from a typical hand-written program. It may contain millions of numbers. �If we do it right, the program works for new cases as well as the ones we trained it on.

Problems Too Difficult to Program by Hand: Classic Example What makes a 2?

Problems Too Difficult to Program by Hand: Classic Example What makes a 2?

Software that Customizes to User www. Amazon. com www. Netflix. com

Software that Customizes to User www. Amazon. com www. Netflix. com

What is ML ? (1/2) �Field of study that gives computer the ability to

What is ML ? (1/2) �Field of study that gives computer the ability to learn without being explicitly programmed (Arthur Samuel, 1956) �Study of algorithms that improve their performance P at some task T with experience E (Tom Mitchell, 1998) Well defined learning task: <P, T, E> T: Play checkers P: % of games won E: Playing against self

What is ML ? (2/2) �Handwriting Recognition �Task T: recognizing and classifying handwritten words

What is ML ? (2/2) �Handwriting Recognition �Task T: recognizing and classifying handwritten words within images �Performance P: percent of words correctly classified �Training experience E: a database of written words with given classification �ML course grade prediction �Task T: predicting student grades for ML course �Performance P: percent of grades correctly predicted �Training experience E: previous courses read by the students and corresponding grades

Learning: Key Steps (1/4) �Data: what past experience can we rely on ? �Names

Learning: Key Steps (1/4) �Data: what past experience can we rely on ? �Names and grades of students in the past ML courses �Academic record of past and current students Student name Course title: ML X Y Peter A B A David B A A Jack ? C A Kate ? A A Training data Current data

Learning: Key Steps (2/4) �Assumption: to simplify the learning problem �The course has remained

Learning: Key Steps (2/4) �Assumption: to simplify the learning problem �The course has remained roughly the same over the years �Each student perform independently from others �Representation Academic records are rather diverse so we might limit the summaries to select few courses. For example, we summaries the ith student (say peter) with vector Xi=[A C B] Where grade may correspond to numeric values

Learning: Key Steps (3/4) �Estimation Given the training data: we need to find a

Learning: Key Steps (3/4) �Estimation Given the training data: we need to find a mapping from “input vectors” x to “labels” y encoding the grades for the ML course. �Possible solution (nearest neighbor classifier) 1. 2. • • For any student x find the “closest” student xi in the training set. Predict yi, the grade of the closest student Evaluation: how can we tell how well our system is predicted? We can wait till the end of this course We can try to assess the accuracy based on the available data �Possible solution 1. 2. Divide the training set into training and test subsets Training the classifier based on training subset and evaluate it based on test subset

Learning: Key Steps (4/4) �Model selection �Refinement � To choose another classifier (instead of

Learning: Key Steps (4/4) �Model selection �Refinement � To choose another classifier (instead of nearest neighbor) � To choose different representation (e. g. base the summaries on different set of courses) � Reducing assumptions (e. g. perhaps students work in groups, etc) �Analysing the performance: We have to rely on the method of evaluating the accuracy of our predictions to select among the possible refinements

Types of ML Algorithms Major main types are: �Supervised learning �Unsupervised learning �Reinforcement learning

Types of ML Algorithms Major main types are: �Supervised learning �Unsupervised learning �Reinforcement learning �Semi Supervised learning

Supervised Learning • A process of finding a model that describes and distinguish data

Supervised Learning • A process of finding a model that describes and distinguish data classes or concepts for the purpose of being able to predict the class of objects whose class label is unknown. • Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class. • Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Supervised learning Attributes of input data Class variable or output Test Set Training Set

Supervised learning Attributes of input data Class variable or output Test Set Training Set Learn Classifier Model

Supervised Learning: Application • Direct Marketing – Goal: Reduce cost of mailing by targeting

Supervised Learning: Application • Direct Marketing – Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. – Approach: • Use the data for a similar product introduced before. • We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute. • Collect various demographic, lifestyle, and companyinteraction related information about all such customers. – Type of business, where they stay, how much they earn, etc. • Use this information as input attributes to learn a classifier model.

Regression Rupees • Predict a value of a given continuous valued variable based on

Regression Rupees • Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. • Greatly studied in statistics, neural network fields. • Examples: – Predicting sales amounts of new product based on advetising expenditure. – Predicting wind velocities as a function of temperature, humidity, air pressure, etc. – Time series prediction of stock market indices. feet 2

Unsupervised Learning • Unlike supervised learning which analyse class-labeled data objects, clustering analyse data

Unsupervised Learning • Unlike supervised learning which analyse class-labeled data objects, clustering analyse data objects without consulting a class. In fact class labels are not present in data because they are not known • Major questions of the clustering are -Are there any “groups” in the data ? -What is each group ? -How many ? -How to identify them?

Clustering Definition • Given a set of data points, each having a set of

Clustering Definition • Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that – Data points in one cluster are more similar to one another. – Data points in separate clusters are less similar to one another. • Similarity Measures: – Euclidean Distance if attributes are continuous. – Other Problem-specific Measures.

Illustrating Clustering Euclidean Distance Based Clustering in 3 -D space Intracluster distances are minimized

Illustrating Clustering Euclidean Distance Based Clustering in 3 -D space Intracluster distances are minimized Intercluster distances are maximized

Clustering: Application • Market Segmentation: – Goal: subdivide a market into distinct subsets of

Clustering: Application • Market Segmentation: – Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. – Approach: • Collect different attributes of customers based on their geographical and lifestyle related information. • Find clusters of similar customers. • Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.

Types of ML Algorithms �Reinforcement learning – Supervised learning: � Correct output for each

Types of ML Algorithms �Reinforcement learning – Supervised learning: � Correct output for each training input is available – Reinforcement learning: � Some evaluation of an input is available, but not the exact output

Reference Literature �Text book: Machine Learning by Tom Mitchell �Reference book: Pattern recognition and

Reference Literature �Text book: Machine Learning by Tom Mitchell �Reference book: Pattern recognition and machine learning by C. Bishop