Chapter 4 Model Adequacy Checking Linear Regression Analysis

  • Slides: 44
Download presentation
Chapter 4 Model Adequacy Checking Linear Regression Analysis 5 E Montgomery, Peck & Vining

Chapter 4 Model Adequacy Checking Linear Regression Analysis 5 E Montgomery, Peck & Vining 1

4. 1 Introduction • Assumptions 1. Relationship between response and regressors is linear (at

4. 1 Introduction • Assumptions 1. Relationship between response and regressors is linear (at least approximately). 2. Error term, has zero mean 3. Error term, has constant variance 4. Errors are uncorrelated 5. Errors are normally distributed (required for tests and intervals) Linear Regression Analysis 5 E Montgomery, Peck & Vining 2

4. 2 Residual Analysis • Definition of Residual (= data – fit): • Approximate

4. 2 Residual Analysis • Definition of Residual (= data – fit): • Approximate average variance: Linear Regression Analysis 5 E Montgomery, Peck & Vining 3

4. 2. 2 Methods for Scaling Residuals • Scaling helps in identifying outliers or

4. 2. 2 Methods for Scaling Residuals • Scaling helps in identifying outliers or extreme values Four Methods 1. 2. 3. 4. Standardized Residuals Studentized Residuals PRESS Residuals R-student Residuals Linear Regression Analysis 5 E Montgomery, Peck & Vining 4

4. 2. 2 Methods for Scaling Residuals 1. Standardized Residuals – di’s have mean

4. 2. 2 Methods for Scaling Residuals 1. Standardized Residuals – di’s have mean zero and variance approximately equal to 1 – Large values of di (di > 3) may indicate an outlier Linear Regression Analysis 5 E Montgomery, Peck & Vining 5

4. 2. 2 Methods for Scaling Residuals 2. Studentized Residuals – MSRes is only

4. 2. 2 Methods for Scaling Residuals 2. Studentized Residuals – MSRes is only an approximation of the variance of the ith residual. – Improve scaling by dividing ei by the exact standard deviation: Linear Regression Analysis 5 E Montgomery, Peck & Vining 6

4. 2. 2 Methods for Scaling Residuals 2. Studentized Residuals The studentized residuals are

4. 2. 2 Methods for Scaling Residuals 2. Studentized Residuals The studentized residuals are then: – ri’s have mean zero and unit variance. – Studentized residuals are generally larger than the corresponding standardized residuals. Linear Regression Analysis 5 E Montgomery, Peck & Vining 7

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals Examine the differences: –

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals Examine the differences: – [these are the differences between the actual response for the ith data point and the fitted value of the response for the ith data point, using all observations except the ith one. ] Linear Regression Analysis 5 E Montgomery, Peck & Vining 8

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals • Logic: If the

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals • Logic: If the ith point is unusual, then it can “overly” influence the regression model. – – If the ith point is used in fitting the model, then the residual for the ith point will be small. If the ith point is not used in fitting the model, then the residual will better reflect how unusual that point is. Linear Regression Analysis 5 E Montgomery, Peck & Vining 9

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals • Prediction error: •

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals • Prediction error: • Calculated for each point, called PRESS residuals – [they will be used later to calculate the “prediction error sum of squares]. • Calculate the PRESS residuals using Linear Regression Analysis 5 E Montgomery, Peck & Vining 10

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals Linear Regression Analysis 5

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals Linear Regression Analysis 5 E Montgomery, Peck & Vining 11

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals • The standardized PRESS

4. 2. 2 Methods for Scaling Residuals 3. PRESS Residuals • The standardized PRESS residuals are • Note: these are the studentized residuals when MSRes is used as the estimate of the variance. Linear Regression Analysis 5 E Montgomery, Peck & Vining 12

4. 2. 2 Methods for Scaling Residuals 4. R-Student • MSRes is an “internal”

4. 2. 2 Methods for Scaling Residuals 4. R-Student • MSRes is an “internal” estimate of variance • Use a variance estimate that is based on all observations except the ith observation: Linear Regression Analysis 5 E Montgomery, Peck & Vining 13

4. 2. 2 Methods for Scaling Residuals 4. R-Student • The R-student residual is

4. 2. 2 Methods for Scaling Residuals 4. R-Student • The R-student residual is • This is an externally studentized residual. Linear Regression Analysis 5 E Montgomery, Peck & Vining 14

Linear Regression Analysis 5 E Montgomery, Peck & Vining 15

Linear Regression Analysis 5 E Montgomery, Peck & Vining 15

Linear Regression Analysis 5 E Montgomery, Peck & Vining 16

Linear Regression Analysis 5 E Montgomery, Peck & Vining 16

4. 2. 3 Residual Plots • Normal Probability Plot of Residuals – Checks the

4. 2. 3 Residual Plots • Normal Probability Plot of Residuals – Checks the normality assumption • Residuals against Fitted values, – Checks for nonconstant variance – Checks for nonlinearity – Look for potential outliers • Do not plot residuals versus yi (why? ) Linear Regression Analysis 5 E Montgomery, Peck & Vining 17

4. 2. 3 Residual Plots Linear Regression Analysis 5 E Montgomery, Peck & Vining

4. 2. 3 Residual Plots Linear Regression Analysis 5 E Montgomery, Peck & Vining 18

4. 2. 3 Residual Plots Linear Regression Analysis 5 E Montgomery, Peck & Vining

4. 2. 3 Residual Plots Linear Regression Analysis 5 E Montgomery, Peck & Vining 19

4. 2. 3 Residual Plots • Residuals against Regressors in the model – Checks

4. 2. 3 Residual Plots • Residuals against Regressors in the model – Checks for nonconstant variance – Look for nonlinearity • Residuals against Regressors not in the model – If a pattern appears, could indicate that adding that regressor might improve the model fit. • Residuals against time order – Check for Correlated errors Linear Regression Analysis 5 E Montgomery, Peck & Vining 20

Figure 4. 3 Patterns for residual plots: a) satisfactory; b) funnel; c) double bow;

Figure 4. 3 Patterns for residual plots: a) satisfactory; b) funnel; c) double bow; d) nonlinear. Linear Regression Analysis 5 E Montgomery, Peck & Vining 21

Example 4. 4 The Delivery Time Data Linear Regression Analysis 5 E Montgomery, Peck

Example 4. 4 The Delivery Time Data Linear Regression Analysis 5 E Montgomery, Peck & Vining 22

Example 4. 4 The Delivery Time Data Linear Regression Analysis 5 E Montgomery, Peck

Example 4. 4 The Delivery Time Data Linear Regression Analysis 5 E Montgomery, Peck & Vining 23

Plot of Residuals in Time Sequence Linear Regression Analysis 5 E Montgomery, Peck &

Plot of Residuals in Time Sequence Linear Regression Analysis 5 E Montgomery, Peck & Vining 24

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots • Why

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots • Why are these used? – To determine if the correct relationship between y and xi has been identified – To determine the marginal contribution of a variable, given all other variables are in the model. Linear Regression Analysis 5 E Montgomery, Peck & Vining 25

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots • Method

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots • Method Say we want to know the importance/relationship between y and some regressor variable, xi. – Regress y against all variables except xi and calculate residuals – Regress xi against all other regressor variables and calculate residuals – Plot these two sets of residuals against each other. Linear Regression Analysis 5 E Montgomery, Peck & Vining 26

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots • Interpretation

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots • Interpretation – If the plot appears to be linear, then a linear relationship between y and xi seems reasonable – If plot is curvilinear, may need xi 2 or 1/xi instead – If xi is a candidate variable, and a horizontal “band” appears, then that variable adds no new information. Linear Regression Analysis 5 E Montgomery, Peck & Vining 27

Example 4. 5 Linear Regression Analysis 5 E Montgomery, Peck & Vining 28

Example 4. 5 Linear Regression Analysis 5 E Montgomery, Peck & Vining 28

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots - Comments

4. 2. 4 Partial Regression and Partial Residual Plots Partial Regression Plots - Comments • Use with caution, they only suggest possible relationships • Do not generally detect interaction effects • If multicollinearity is present, regression plots could give incorrect information • The slope of the partial regression plot is the regression coefficient for the variable of interest! Linear Regression Analysis 5 E Montgomery, Peck & Vining 29

4. 2. 5 Other Residual Plotting and Analysis Methods • Plotting regressors against each

4. 2. 5 Other Residual Plotting and Analysis Methods • Plotting regressors against each other can give info. about the relationship between the two: – may indicate correlation between the regressors. – may uncover remote points Linear Regression Analysis 5 E Montgomery, Peck & Vining 30

Note location of these two point in the x - space Linear Regression Analysis

Note location of these two point in the x - space Linear Regression Analysis 5 E Montgomery, Peck & Vining 31

4. 3 The PRESS Statistic • PRESS Residual: • Prediction Error Sum of Squares

4. 3 The PRESS Statistic • PRESS Residual: • Prediction Error Sum of Squares (PRESS) Statistic: • A small value of the PRESS Statistic is desired. • See Table 4. 1 Linear Regression Analysis 5 E Montgomery, Peck & Vining 32

4. 3 The PRESS Statistic R 2 for Prediction Based on PRESS • Interpretation:

4. 3 The PRESS Statistic R 2 for Prediction Based on PRESS • Interpretation: – We expect the model to explain about R 2% of the variability in prediction of a new observation. • PRESS is a valuable statistic for comparison of models. Linear Regression Analysis 5 E Montgomery, Peck & Vining 33

4. 4 Outliers • An outlier is an observation that is considerably different from

4. 4 Outliers • An outlier is an observation that is considerably different from the others • Formal tests for outliers • Points with large residuals may be outliers • Impact can be assessed by removing the points and refitting • How should they be treated? Linear Regression Analysis 5 E Montgomery, Peck & Vining 34

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack of Fit • Assumes – normality, independence, constant variance assumptions have been met. – Only the first-order or straight line model is in doubt. • Requires – replication of y for at least one level of x. Linear Regression Analysis 5 E Montgomery, Peck & Vining 35

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack of Fit • With replication, we can obtain a “modelindependent” estimate of 2 • Say there are ni observations of the response at the ith level of the regressor xi, i = 1, 2, …m • yij denotes the jth observation on the response at xi, j = 1, 2, …, ni • Total number of observations is Linear Regression Analysis 5 E Montgomery, Peck & Vining 36

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack of Fit • Partitioning of the residual sum of squares: SSRes = SSPE + SSLOF • SSPE - pure error sum of squares • SSLOF - lack of fit sum of squares • Note that the (ij)th residual can be partitioned, squared and summed. Linear Regression Analysis 5 E Montgomery, Peck & Vining 37

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack of Fit • If the assumption of constant variance is satisfied, then SSPE is a “model-independent” measure of pure error. • If the function really is linear, then will be very close to and SSLOF will be quite small. Linear Regression Analysis 5 E Montgomery, Peck & Vining 38

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack of Fit • Test Statistic: • If F 0 > F , m-2, n-m conclude that the regression function is not linear. Why? Linear Regression Analysis 5 E Montgomery, Peck & Vining 39

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack

4. 5 Lack of Fit of the Regression Model A Formal Test for Lack of Fit • If the test indicates lack of fit, abandon the model, try a different one. • If the test indicates no lack of fit, then MSLOF and MSPE are combined to estimate 2. Linear Regression Analysis 5 E Montgomery, Peck & Vining 40

Example 4. 8 Linear Regression Analysis 5 E Montgomery, Peck & Vining 41

Example 4. 8 Linear Regression Analysis 5 E Montgomery, Peck & Vining 41

Linear Regression Analysis 5 E Montgomery, Peck & Vining 42

Linear Regression Analysis 5 E Montgomery, Peck & Vining 42

An Approximate Procedure based on Estimating Error from Near-Neighbors Linear Regression Analysis 5 E

An Approximate Procedure based on Estimating Error from Near-Neighbors Linear Regression Analysis 5 E Montgomery, Peck & Vining 43

See Example 4. 10, pg. 162 Linear Regression Analysis 5 E Montgomery, Peck &

See Example 4. 10, pg. 162 Linear Regression Analysis 5 E Montgomery, Peck & Vining 44