Regularized risk minimization Usman Roshan Supervised learning for
- Slides: 22
Regularized risk minimization Usman Roshan
Supervised learning for two classes • We are given n training samples (xi, yi) for i=1. . n drawn i. i. d from a probability distribution P(x, y). • Each xi is a d-dimensional vector (xi in Rd) and yi is +1 or -1 • Our problem is to learn a function f(x) for predicting the labels of test samples xi’ in Rd for i=1. . n’ also drawn i. i. d from P(x, y)
Loss function • Loss function: c(x, y, f(x)) • Maps to [0, inf] • Examples:
Test error • We quantify the test error as the expected error on the test set (in other words the average test error). In the case of two classes: • We’d like to find f that minimizes this but we need P(y|x) which we don’t have access to.
Expected risk • Suppose we didn’t have test data (x’). Then we average the test error over all possible data points x • We want to find f that minimizes this but we don’t have all data points. We only have training data.
Empirical risk • Since we only have training data we can’t calculate the expected risk (we don’t even know P(x, y)). • Solution: we approximate P(x, y) with the empirical distribution pemp(x, y) • The delta function δx(y)=1 if x=y and 0 otherwise.
Empirical risk • We can now define the empirical risk as • Once the loss function is defined and training data is given we can then find f that minimizes this.
Example of minimizing empirical risk (least squares) • Suppose we are given n data points (xi, yi) where each xi in Rd and yi in R. We want to determine a linear function f(x)=ax+b for predicting test points. • Loss function c(xi, yi, f(xi))=(yi-f(xi))2 • What is the empirical risk?
Empirical risk for least squares Now finding f has reduced to finding a and b. Since this function is convex in a and b we know there is a global optimum which is easy to find by setting first derivatives to 0.
Maximum likelihood and empirical risk • Maximizing the likelihood P(D|M) is the same as maximizing log(P(D|M)) which is the same as minimizing -log(P(D|M)) • Set the loss function to • Now minimizing the empirical risk is the same as maximizing the likelihood
Empirical risk • We pose the empirical risk in terms of a loss function and go about to solve it. • Input: n training samples xi each of dimension d along with labels yi • Output: a linear function f(x)=w. Tx+w 0 that minimizes the empirical risk
Empirical risk examples • Linear regression • How about logistic regression?
Logistic regression • Recall the logistic regression model: • Let y=+1 be case and y=-1 be control. • The sample likelihood of the training data is given by
Logistic regression • We find our parameters w and w 0 by maximizing the likelihood or minimizing the -log(likelihood). • The -log of the likelihood is
Logistic regression loss function
SVM loss function • Recall the SVM optimization problem: • The loss function (second term) can be written as
Different loss functions • Linear regression • Logistic regression • SVM
Regularized risk minimization • Minimize • Note the additional term added to the empirical risk.
Representer theorem Plays a central role in statistical estimation Taken from Learning with Kernels by Scholkopf and Smola
Other loss functions • From “A Scalable Modular Convex Solver for Regularized Risk Minimization”, Teo et. al. , KDD 2007
Regularizer • L 1 norm: • L 1 gives sparse solution (many entries will be zero) • Logistic loss with L 1 also known as “lasso” • L 2 norm:
Regularized risk minimizer exercise • Compare SVM to regularized logistic regression • Software: http: //users. cecs. anu. edu. au/~chteo/BMRM. html • Version 2. 1 executables for OSL machines available on course website
- Expected risk machine learning
- Bnfo
- Usman roshan njit
- Usman roshan
- Cs 675
- Usman roshan
- Usman roshan
- Usman roshan
- Usman roshan
- "deep reinforcement learning"
- Perbedaan supervised dan unsupervised classification
- Andrew ng house
- Empirical risk minimization python
- Risk minimization plan
- Market risk credit risk operational risk
- Alexandru niculescu-mizil
- Supervised learning pipeline
- Partially supervised learning
- Youtube.com
- Supervised and unsupervised learning
- Deep q network
- Rohana roshan
- Roshan chitrakar