Machine Learning Photo CMU Machine Learning Department protests

  • Slides: 28
Download presentation
Machine Learning Photo: CMU Machine Learning Department protests G 20 Slides: Isabelle Guyon, Erik

Machine Learning Photo: CMU Machine Learning Department protests G 20 Slides: Isabelle Guyon, Erik Sudderth, Mark Johnson, Derek Hoiem, Lana Lazebnik

Machine Learning is… • A branch of artificial intelligence, concerns the construction and study

Machine Learning is… • A branch of artificial intelligence, concerns the construction and study of systems that can learn from data. • Studies how to automatically learn to make accurate predictions based on past observations

Machine Learning is…

Machine Learning is…

Machine Learning aka. § data mining: machine learning applied to databases, i. e. collections

Machine Learning aka. § data mining: machine learning applied to databases, i. e. collections of data § inference and/or estimation in statistics § pattern recognition in engineering § signal processing in electrical engineering § optimization

Supervised vs Unsupervised Learning • In Supervised learning categories are known. • In unsupervised

Supervised vs Unsupervised Learning • In Supervised learning categories are known. • In unsupervised learning, they are not, and the learning process attempts to discover the appropriate categories.

Supervised vs Unsupervised Learning

Supervised vs Unsupervised Learning

Classification… • Spam Detection: Given email in an inbox, identify those email messages that

Classification… • Spam Detection: Given email in an inbox, identify those email messages that are spam and those that are not. • Credit Card Fraud Detection: Given credit card transactions for a customer in a month, identify those transactions that were made by the customer and those that were not. • Evil/Good…

Classification framework f( ) = “apple” f( ) = “tomato” f( ) = “cow”

Classification framework f( ) = “apple” f( ) = “tomato” f( ) = “cow” Slide credit: L. Lazebnik

Classification, cont. y = f(x) output prediction function Image feature • Training: given a

Classification, cont. y = f(x) output prediction function Image feature • Training: given a training set of labeled examples {(x 1, y 1), …, (x. N, y. N)}, estimate the prediction function f by minimizing the prediction error on the training set • Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Slide credit: L. Lazebnik

The process Training Labels Training Images Image Features Training Learned model Prediction Testing Image

The process Training Labels Training Images Image Features Training Learned model Prediction Testing Image Features Test Image Slide credit: D. Hoiem and L. Lazebnik

Classifiers: Nearest neighbor Training examples from class 1 Test example Training examples from class

Classifiers: Nearest neighbor Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x • All we need is a distance function for our inputs • No training required! Slide credit: L. Lazebnik

Classifiers: Linear • Find a linear function to separate the classes: f(x) = sgn(w

Classifiers: Linear • Find a linear function to separate the classes: f(x) = sgn(w x + b) Slide credit: L. Lazebnik

Regression • Data is labelled with a real value rather then a label (Numeric/Factor).

Regression • Data is labelled with a real value rather then a label (Numeric/Factor). • Useful to predict time series data like the price of a stock over time. • The decision being modelled is what value to predict for new unpredicted data. • Learning a linear regression model means estimating the values of the coefficients used in the representation with the data that we have available.

Clustering (Data Mining) • Data is not labelled. • It can however be divided

Clustering (Data Mining) • Data is not labelled. • It can however be divided into groups based on similarity and other measures of natural structure in the data. • Market segmentation is one of the most famously used example of cluster analysis.

Dimensionality Reduction • Most algorithms works on columns (as variables) • Datasets with thousands

Dimensionality Reduction • Most algorithms works on columns (as variables) • Datasets with thousands of variables makes the algorithms run slower. • Important to reduce the number of columns in the data set while losing the smallest amount of information by doing so. • Missing Values Ratio, Low Variance Filter, High Correlation Filter, PCA, Random Forests / Ensemble Trees, etc.

Generalization Training set (labels known) Test set (labels unknown) • How well does a

Generalization Training set (labels known) Test set (labels unknown) • How well does a learned model generalize from the data it was trained on to a new test set? Slide credit: L. Lazebnik

Generalization • Components of generalization error – Bias: how much the average model over

Generalization • Components of generalization error – Bias: how much the average model over all training sets differ from the true model? • Error due to inaccurate assumptions/simplifications made by the model – Variance: how much models estimated from different training sets differ from each other • Under fitting: model is too “simple” to represent all the relevant class characteristics – High bias and low variance – High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error Slide credit: L. Lazebnik

Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large

Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large bias (not enough flexibility). • Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Slide credit: D. Hoiem

Bias-Variance Trade-off E(MSE) = noise 2 + bias 2 + variance Unavoidable error Error

Bias-Variance Trade-off E(MSE) = noise 2 + bias 2 + variance Unavoidable error Error due to incorrect assumptions Error due to variance of training samples See the following for explanations of bias-variance (also Bishop’s “Neural Networks” book): • http: //www. inf. ed. ac. uk/teaching/courses/mlsc/Notes/Lecture 4/Bias. Variance. pdf Slide credit: D. Hoiem

Bias-variance tradeoff Overfitting Error Underfitting Test error Training error High Bias Low Variance Complexity

Bias-variance tradeoff Overfitting Error Underfitting Test error Training error High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem

Toolkit • R • Python • Stata • VBA and SQL • Git and

Toolkit • R • Python • Stata • VBA and SQL • Git and Git. Hub

R Advantages • Fast and free. • State of the art: Statistical researchers provide

R Advantages • Fast and free. • State of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R! • Highly customizable. • Active user community • Excellent for simulation, programming, computer intensive analyses, etc. A very brief introduction to R, M. Keller Disadvantages • Not user friendly at start • Steep learning curve, minimal GUI. • Easy to make mistakes and not know. • Working with large datasets is limited by RAM

Python • Language with strong similarities to PERL, C but with powerful typing and

Python • Language with strong similarities to PERL, C but with powerful typing and object oriented features. • Commonly used for producing HTML content on websites. Great for text files. • Useful built-in types (lists, dictionaries). • Clean syntax, powerful extensions • Ease of use; interpreter • AI Processing: Statistical – Python has strong numeric processing capabilities: matrix operations, etc. – Suitable for probability and machine learning code. Based on presentation from www. cis. upenn. edu/~cse 391/cse 391_2004/Python. Intro 1. ppt

Stata • Typically used in the areas of economics and politics. • Friendly-user environment.

Stata • Typically used in the areas of economics and politics. • Friendly-user environment. • Pretty easy to learn • Ado files are available for extensions • Impact Evaluation in Practice

VBA and SQL • Visual Basic Applications and Structured Query Language both extensively used

VBA and SQL • Visual Basic Applications and Structured Query Language both extensively used in BA. • Ability to retrieve data stored in SQL format. • Connections with R and Python possible and available. • Easier to work with R and Python when SQL is mastered.

Git-Git. Hub • Git – Version control system • Git. Hub – Repository site

Git-Git. Hub • Git – Version control system • Git. Hub – Repository site

Git-Git. Hub, Safa

Git-Git. Hub, Safa

References • • http: //cs. stackexchange. com/questions/2907/what-exactly-is-the-difference-betweensupervised-and-unsupervised-learning http: //www. kdnuggets. com/2015/05/7 -methods-data-dimensionality-reduction. html http:

References • • http: //cs. stackexchange. com/questions/2907/what-exactly-is-the-difference-betweensupervised-and-unsupervised-learning http: //www. kdnuggets. com/2015/05/7 -methods-data-dimensionality-reduction. html http: //machinelearningmastery. com/a-tour-of-machine-learning-algorithms/ https: //discuss. analyticsvidhya. com/t/difference-between-supervised-and-unsupervisedlearning/1196 https: //www. quora. com/Which-is-better-for-data-analysis-R-or-Python Git-Github, Safa A very brief introduction to R, Matthew Keller & Steven Boker Slide credit: D. Hoiem