CS 2750 Machine Learning Linear Models for Classification
- Slides: 47
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016
Plan for Today • • • Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
The effect of outliers Magenta = least squares, green = logistic regression Figures from Bishop
With three classes Left = least squares, right = logistic regression Figures from Bishop
Fisher’s linear discriminant Figures from Bishop
Plan for Today • • • Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
The perceptron algorithm • Rosenblatt (1962) • Prediction rule: where • Loss: (just using the Misclassified examples)
The perceptron algorithm • Loss: • Learning algorithm update rule: • Interpretation: – If sample is being misclassified, make the weight vector more like it
The perceptron algorithm Figures from Bishop
Plan for Today • • • Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
Maximum Likelihood Estimation (MLE) u We have a probabilistic model, M, of some phenomena, but we do not know its parameters, . u Each “execution” of M produces an observation, x[i] , according to the (unknown) distribution induced by M. u Goal: After observing x[1] , …, x[n], estimate the model parameters, , that generated the observed data. u This vector parameter can be used to predict future data. u MLE Principle: Choose parameters that maximize the likelihood of the data Adapted from Nir Friedman
Maximum Likelihood Estimation (MLE) u The likelihood of the observed data, given the model parameters , is the conditional probability that the model, M, with parameters , produces x[1] , …, x[n]. L( )=Pr(x[1] , …, x[n] | , M), u In MLE we seek the model parameters, , that maximize the likelihood. Adapted from Nir Friedman
Example: Binomial Experiment u When tossed, it can land in one of two positions: Head (H) or Tail (T) u. We denote by the (unknown) probability P(H). Estimation task: u Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 - Adapted from Nir Friedman
The Likelihood Function u How good is a particular ? It depends on how likely it is to generate the observed data u The likelihood for the sequence H, T, T, H, H is u Taking derivative and equating it to 0, we get Adapted from Nir Friedman
Issues u Overconfidence u Better: Maximum-a-posteriori (MAP)
Plan for Today • • • Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
Multi-class problems • Instead of just two classes, we now have C classes • E. g. predict which movie genre a viewer likes best • Possible answers: action, drama, indie, thriller, etc. • Two approaches: – One-vs-all – One-vs-one
Multi-class problems • One-vs-all (a. k. a. one-vs-others) – Train K classifiers – In each, pos = data from class i, neg = data from classes other than i – The class with the most confident prediction wins – Example: • • • You have 4 classes, train 4 classifiers 1 vs others: score 3. 5 2 vs others: score 6. 2 3 vs others: score 1. 4 4 vs other: score 5. 5 Final prediction: class 2 – Issues?
Multi-class problems • One-vs-one (a. k. a. all-vs-all) – Train K(K-1)/2 binary classifiers (all pairs of classes) – They all vote for the label – Example: • • You have 4 classes, then train 6 classifiers 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 Votes: 1, 1, 4, 2, 4, 4 Final prediction is class 4
Multi-class problems • What are some problems with this approach to doing multi-class? – There are “natively multi-class” methods Figures from Bishop
Plan for Today • • • Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
Generative models • Binary case:
Generative models • Mutli-class case:
Generative models • Why are these called generative? • Can use them to generate new samples x • Perhaps this is overkill?
- 2750-1500
- Asme 2750 furnaces
- Gerard schuster
- Geometric models in machine learning
- Stealing machine learning models via prediction apis
- Stealing machine learning models via prediction apis
- Andrew ng linear regression
- Linear regression with multiple features
- Linear regression with multiple variables machine learning
- Product classification machine learning
- Classification machine learning
- Concept learning task in machine learning
- Analytical learning in machine learning
- Pac learning model in machine learning
- Machine learning t mitchell
- Inductive and analytical learning in machine learning
- Inductive and analytical learning problem
- Instance based learning in machine learning
- Inductive learning machine learning
- First order rule learning in machine learning
- Eager learner vs lazy learner
- Cmu machine learning
- Difference between models and semi modals
- Cuadro comparativo e-learning y b-learning
- Thorsten brants
- Both the linear and interactive communication models
- Deliberation model of curriculum development
- Linear models and rates of change
- Communication models examples
- A unifying review of linear gaussian models
- Linear basis function models
- Subcripted
- Linear programming models: graphical and computer methods
- Curve fitting with linear models
- Linear functions as mathematical models
- Linear quadratic and exponential graphs
- Linear functions as mathematical models
- Curve fitting with linear models
- Fspos vägledning för kontinuitetshantering
- Novell typiska drag
- Tack för att ni lyssnade bild
- Vad står k.r.å.k.a.n för
- Shingelfrisyren
- En lathund för arbete med kontinuitetshantering
- Särskild löneskatt för pensionskostnader
- Personlig tidbok fylla i
- A gastrica
- Förklara densitet för barn