Regression Linear Regression Trees Jeff Howbert Introduction to

Regression Linear Regression Trees Jeff Howbert Introduction to Machine Learning Winter 2014 1

Characteristics of classification models model linear parametric global stable decision tree no no logistic regression yes yes yes/no yes yes k-nearest neighbor no no naïve Bayes no yes/no yes discriminant analysis Jeff Howbert Introduction to Machine Learning Winter 2014 2

slide thanks to Greg Shakhnarovich (CS 195 -5, Brown Univ. , 2006) Jeff Howbert Introduction to Machine Learning Winter 2014 3

slide thanks to Greg Shakhnarovich (CS 195 -5, Brown Univ. , 2006) Jeff Howbert Introduction to Machine Learning Winter 2014 4

slide thanks to Greg Shakhnarovich (CS 195 -5, Brown Univ. , 2006) Jeff Howbert Introduction to Machine Learning Winter 2014 5

slide thanks to Greg Shakhnarovich (CS 195 -5, Brown Univ. , 2006) Jeff Howbert Introduction to Machine Learning Winter 2014 6

slide thanks to Greg Shakhnarovich (CS 195 -5, Brown Univ. , 2006) Jeff Howbert Introduction to Machine Learning Winter 2014 7

Loss function l l Suppose target labels come from set Y – Binary classification: Y = { 0, 1 } – Regression: Y= (real numbers) A loss function maps decisions to costs: – defines the penalty for predicting when the true value is. Standard choice for classification: 0/1 loss (same as misclassification error) Standard choice for regression: squared loss Jeff Howbert Introduction to Machine Learning Winter 2014 8

Least squares linear fit to data l l Most popular estimation method is least squares: – Determine linear coefficients w that minimize sum of squared loss (SSL). – Use standard (multivariate) differential calculus: u differentiate SSL with respect to w u find zeros of each partial differential equation u solve for each wi In one dimension: Jeff Howbert Introduction to Machine Learning Winter 2014 9

Least squares linear fit to data l Multiple dimensions – To simplify notation and derivation, add a new feature x 0 = 1 to feature vector x: – Calculate SSL and determine w: Jeff Howbert Introduction to Machine Learning Winter 2014 10

Least squares linear fit to data Jeff Howbert Introduction to Machine Learning Winter 2014 11

Least squares linear fit to data Jeff Howbert Introduction to Machine Learning Winter 2014 12

Extending application of linear regression l The inputs X for linear regression can be: – Original quantitative inputs – Transformation of quantitative inputs, e. g. log, exp, square root, square, etc. – Polynomial transformation u example: y = w 0 + w 1 x + w 2 x 2 + w 3 x 3 – Basis expansions – Dummy coding of categorical inputs – Interactions between variables u l example: x 3 = x 1 x 2 This allows use of linear regression techniques to fit much more complicated non-linear datasets. Jeff Howbert Introduction to Machine Learning Winter 2014 13

Example of fitting polynomial curve with linear model Jeff Howbert Introduction to Machine Learning Winter 2014 14

Prostate cancer dataset l l l 97 samples, partitioned into: – 67 training samples – 30 test samples Eight predictors (features): – 6 continuous (4 log transforms) – 1 binary – 1 ordinal Continuous outcome variable: – lpsa: log( prostate specific antigen level ) Jeff Howbert Introduction to Machine Learning Winter 2014 15

Correlations of predictors in prostate cancer dataset lcavol lweight age lbph svi lcp gleason pgg 45 Jeff Howbert log cancer volume log prostate weight age log amount of benign prostatic hypertrophy seminal vesicle invasion log capsular penetration Gleason score percent of Gleason scores 4 or 5 Introduction to Machine Learning Winter 2014 16

Fit of linear model to prostate cancer dataset Jeff Howbert Introduction to Machine Learning Winter 2014 17

Regularization l Complex models (lots of parameters) often prone to overfitting. l Overfitting can be reduced by imposing a constraint on the overall magnitude of the parameters. l Two common types of regularization (shrinkage) in linear regression: – L 2 regularization (a. k. a. ridge regression). Find w which minimizes: u is the regularization parameter: bigger imposes more constraint – L 1 regularization (a. k. a. lasso). Find w which minimizes: Jeff Howbert Introduction to Machine Learning Winter 2014 18

Example of L 2 regularization shrinks coefficients towards (but not to) zero, and towards each other. Jeff Howbert Introduction to Machine Learning Winter 2014 19

Example of L 1 regularization shrinks coefficients to zero at different rates; different values of give models with different subsets of features. Jeff Howbert Introduction to Machine Learning Winter 2014 20

Example of subset selection Jeff Howbert Introduction to Machine Learning Winter 2014 21

Comparison of various selection and shrinkage methods Jeff Howbert Introduction to Machine Learning Winter 2014 22

L 1 regularization gives sparse models, L 2 does not Jeff Howbert Introduction to Machine Learning Winter 2014 23

Other types of regression l In addition to linear regression, there are: – many types of non-linear regression trees u nearest neighbor u neural networks u support vector machines u – locally linear regression – etc. Jeff Howbert Introduction to Machine Learning Winter 2014 24

Regression trees Model very similar to classification trees l Structure: binary splits on single attributes l Prediction: mean value of training samples in leaf l Induction: – greedy – loss function: sum of squared loss l Jeff Howbert Introduction to Machine Learning Winter 2014 25

Regression trees Jeff Howbert Introduction to Machine Learning Winter 2014 26

Regression tree loss function l Assume: – Attribute and split threshold for candidate split are selected – Candidate split partitions samples at parent node into child node sample sets C 1 and C 2 – Loss for the candidate split is: Jeff Howbert Introduction to Machine Learning Winter 2014 27

Characteristics of regression models model linear parametric global stable continuous linear regression yes yes yes regression tree no no no Jeff Howbert Introduction to Machine Learning Winter 2014 28

MATLAB interlude matlab_demo_08. m Jeff Howbert Introduction to Machine Learning Winter 2014 29