Linear Models for Regression CSE 4309 Machine Learning
- Slides: 65
Linear Models for Regression CSE 4309 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1
The Regression Problem • 2
A Regression Example • circles: training data. • red curve: one possible solution: a line. • green curve: another possible solution: a cubic polynomial. 3
Linear Models for Regression • 4
Dot Product Version • 6
Common Choices for Basis Functions • 7
Common Choices for Basis Functions • 8
Common Choices for Basis Functions • 9
Common Choices for Basis Functions • 10
Common Choices for Basis Functions • 11
Global Versus Local Functions • 12
Linear Versus Nonlinear Functions • A linear function y(x, w) produces an output that depends linearly on both x and w. • Note that polynomial, Gaussian, and sigmoidal basis functions are nonlinear. • If we use nonlinear basis functions, then the regression process produces a function y(x, w) which is: – Linear to w. – Nonlinear to x. • It is important to remember: linear regression can be used to estimate nonlinear functions of x. – It is called linear regression because y is linear to w, NOT because y is linear to x. 13
Solving Regression Problems • There are different methods for solving regression problems. • We will study two approaches: – Least squares: find the weights w that minimize the squared error. – Regularized least squares: find the weights w that minimize the squared error, using some hand-picked regularization parameter λ. 14
The Gaussian Noise Assumption • 15
The Gaussian Noise Assumption • 16
Finding the Most Likely Solution • 17
Finding the Most Likely Solution • 18
Likelihood of the Training Data • 19
Likelihood of the Training Data • 20
Log Likelihood of the Training Data • 21
Log Likelihood of the Training Data 22
Log Likelihood of the Training Data • 23
Log Likelihood and Sum-of-Squares Error • 24
Minimizing Sum-of-Squares Error • 25
Minimizing Sum-of-Squares Error • 26
Minimizing Sum-of-Squares Error • 27
Minimizing Sum-of-Squares Error • 28
Minimizing Sum-of-Squares Error • 29
Minimizing Sum-of-Squares Error • 30
Minimizing Sum-of-Squares Error • 31
Minimizing Sum-of-Squares Error • 32
Minimizing Sum-of-Squares Error • 33
Minimizing Sum-of-Squares Error • 34
Minimizing Sum-of-Squares Error • 35
Minimizing Sum-of-Squares Error • 36
Minimizing Sum-of-Squares Error • 37
Minimizing Sum-of-Squares Error • Transposing both sides 38
Minimizing Sum-of-Squares Error • 39
Minimizing Sum-of-Squares Error • 40
Minimizing Sum-of-Squares Error • 41
Solving for β • 43
Least Squares Solution - Summary • 44
Numerical Issues • Black circles: training data. • Green curve: true function y(x)=sin(2πx). • Red curve: – Top figure: Fitted 7 -degree polynomial. – Bottom figure: Fitted 8 -degree polynomial. • Do you see anything strange? 45
Numerical Issues • The 8 -degree polynomial fits the data worse than the 7 degree polynomial. • Is this possible? 46
Numerical Issues • The 8 -degree polynomial fits the data worse than the 7 degree polynomial. • Mathematically, this is impossible. • 8 -degree polynomials are a superset of 7 -degree polynomials. • Thus, the best-fitting 8 -degree polynomial cannot fit the data worse than the best-fitting 7 degree polynomial. 47
Numerical Issues • 48
Numerical Issues • Work-around in Matlab: • inv(phi' * phi) leads to the incorrect 8 -degree polynomial fit below. • pinv(phi' * phi) leads to the correct 8 -degree polynomial fit above (almost identical to the 7 -degree result). 49
Sequential Learning • 50
Sequential Learning • 51
Sequential Learning • 52
Sequential Learning - Intuition • 53
Sequential Learning - Intuition • 54
Regularization • Here we have a training set of 10 points (shown as blue circles). • In green, we see the function that was used to generate these points (noise was added to that function). – It is a sinusoidal function. • In red we see the best-fitting polynomials of degree 1, 3, 9. • Interestingly, degree 3 matches the data well. – It would not match as well if we included points from more periods of the sinusoidal wave. 55
Regularization • The solution for degree 9 suffers severely from overfitting. 56
Regularization • We note that overfitting leads to very large magnitudes of parameters. Degree 1 Degree 3 Degree 9 w 0 0. 82 0. 31 0. 35 w 1 -1. 27 7. 99 232. 37 w 2 -25. 43 -5321. 83 w 3 17. 37 48568. 31 w 4 -231639. 30 w 5 640042. 26 w 6 -1061800. 52 w 7 1042400. 18 w 8 -557682. 99 w 9 125201. 43 57
Regularization • 58
Regularization w 0 0. 35 w 1 232. 37 4. 74 w 2 -5321. 83 -0. 77 w 3 48568. 31 -31. 97 w 4 -231639. 30 -3. 89 w 5 640042. 26 55. 28 w 6 -1061800. 52 41. 32 w 7 1042400. 18 -45. 95 w 8 -557682. 99 -91. 53 w 9 125201. 43 72. 68 A small λ solves the overfitting 59 problem in this case.
Regularized Least Squares • 60
Solving Regularized Least Squares • 61
Why Use Regularization? • 62
Impact of Parameter λ 63
Impact of Parameter λ 64
Impact of Parameter λ • As the previous figures show: – Low values of λ lead to polynomials whose values fluctuate more and more rapidly. • This can lead to increased overfitting. – High values of λ lead to flatter and flatter polynomials, that look more and more like straight lines. • This can lead to increased underfitting, or not fitting the data sufficiently. 65
- Simple linear regression and multiple linear regression
- Cost funtion
- Linear regression with multiple variables machine learning
- Linear regression with multiple variables machine learning
- Logistic regression vs linear regression
- Logistic regression vs linear regression
- Multiple linear regression
- Locally weighted regression in machine learning
- Gerard schuster
- Geometric models in machine learning
- Stealing machine learning models via prediction apis
- Stealing machine learning models via prediction apis
- Double log functional form
- Qualitative response regression models
- Probit econometrics
- Advanced regression and multilevel models
- Panel data
- Advanced regression models
- Types of regression models
- Concept learning task in machine learning
- Analytical learning in machine learning
- Pac learning model in machine learning
- Pac learning model in machine learning
- Inductive and analytical learning
- In analytical learning hypothesis fits
- Instance based learning in machine learning
- Inductive learning machine learning
- First order rule learning in machine learning
- Lazy learning vs eager learning
- Deep learning vs machine learning
- Semi-modals
- Cuadro comparativo de e-learning
- Support vector machine regression
- Knn linear regression
- Hierarchical linear regression spss
- Linear regression riddle b
- Scala meter
- Linear trend equation
- Apa yang dimaksud dengan regresi linear?
- Linear regression spss
- Sum of squares
- Ap statistics linear regression
- Calculation of coefficient of determination
- Log linear regression model
- Anova
- Log linear regression model
- Classical linear regression model econometrics
- Sejarah analisis regresi
- Linear regression loss function
- Classical normal linear regression model
- Linear regression hypothesis
- Multiple linear regression variance
- Regression
- Minitab regression analysis
- Multiple linear regression analysis formula
- Linear regression riddle a answer key
- Kr
- Linear regression gradient descent
- Chapter 7 linear regression
- Linear regression lecture
- Linear regression model validation techniques
- Bivariate linear regression
- Linear regression predict
- Excel
- Law of iterated expectations
- Chapter 8 linear regression