Statistics for Managers Using Microsoft Excel 3 rd

  • Slides: 98
Download presentation
Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 12 Multiple Regression ©

Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 12 Multiple Regression © 2002 Prentice-Hall, Inc. Chap 12 -1

Chapter Topics n The multiple regression model n Residual analysis n n n Testing

Chapter Topics n The multiple regression model n Residual analysis n n n Testing for the significance of the regression model Inferences on the population regression coefficients Testing portions of the multiple regression model © 2002 Prentice-Hall, Inc. 2

Chapter Topics n n n (continued) The quadratic regression model Dummy variables Using transformation

Chapter Topics n n n (continued) The quadratic regression model Dummy variables Using transformation in regression models Collinearity Model building Pitfalls in multiple regression and ethical considerations © 2002 Prentice-Hall, Inc. 3

The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables

The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Dependent (Response) variable for sample © 2002 Prentice-Hall, Inc. Population slopes Independent (Explanatory) variables for sample model Random Error Residual 4

Population Multiple Regression Model Bivariate model Response Plane X 1 © 2002 Prentice-Hall, Inc.

Population Multiple Regression Model Bivariate model Response Plane X 1 © 2002 Prentice-Hall, Inc. Y Y i = 0 + 1 X 1 i + 2 X 2 i + ei (Observed Y) 0 ei X 2 (X 1 i, X 2 i) m. Y|X = 0 + 1 X 1 i + 2 X 2 i 5

Sample Multiple Regression Model Bivariate model Response Plane X 1 Y Yi = b

Sample Multiple Regression Model Bivariate model Response Plane X 1 Y Yi = b 0 + b 1 X 1 i + b 2 X 2 i + ei (Observed Y) b 0 ei X 2 (X 1 i, X 2 i) ^ Y i = b 0 + b 1 X 1 i + b 2 X 2 i © 2002 Prentice-Hall, Inc. Sample Regression Plane 6

Simple and Multiple Regression Compared n n Coefficients in a simple regression pick up

Simple and Multiple Regression Compared n n Coefficients in a simple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable. Coefficients in a multiple regression net out the impacts of other variables in the equation. © 2002 Prentice-Hall, Inc. 7

Simple and Multiple Regression Compared: Example n Two simple regressions: n n n Multiple

Simple and Multiple Regression Compared: Example n Two simple regressions: n n n Multiple regression: n © 2002 Prentice-Hall, Inc. 8

Multiple Linear Regression Equation Too complicated by hand! © 2002 Prentice-Hall, Inc. Ouch! 9

Multiple Linear Regression Equation Too complicated by hand! © 2002 Prentice-Hall, Inc. Ouch! 9

Interpretation of Estimated Coefficients n Slope (bi) n n n Estimated that the average

Interpretation of Estimated Coefficients n Slope (bi) n n n Estimated that the average value of Y changes by bi for each 1 unit increase in Xi holding all other variables constant (ceterus paribus) Example: if b 1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X 1) given the inches of insulation (X 2) Y-intercept (b 0) n The estimated average value of Y when all Xi = 0 © 2002 Prentice-Hall, Inc. 10

Multiple Regression Model: Example (0 F) Develop a model for estimating heating oil used

Multiple Regression Model: Example (0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches. © 2002 Prentice-Hall, Inc. 11

Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the

Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5. 437 gallons, holding insulation constant. © 2002 Prentice-Hall, Inc. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20. 012 gallons, holding temperature constant. 12

Multiple Regression in PHStat | regression | multiple regression … n EXCEL spreadsheet for

Multiple Regression in PHStat | regression | multiple regression … n EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc. 13

Venn Diagrams and Explanatory Power of Regression Variations in Temp not used in explaining

Venn Diagrams and Explanatory Power of Regression Variations in Temp not used in explaining variation in Oil Temp © 2002 Prentice-Hall, Inc. Oil Variations in Oil explained by the error term Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil 14

Venn Diagrams and Explanatory Power of Regression (continued) Oil Temp © 2002 Prentice-Hall, Inc.

Venn Diagrams and Explanatory Power of Regression (continued) Oil Temp © 2002 Prentice-Hall, Inc. 15

Venn Diagrams and Explanatory Power of Regression Variation NOT explained by Temp nor Insulation

Venn Diagrams and Explanatory Power of Regression Variation NOT explained by Temp nor Insulation Temp © 2002 Prentice-Hall, Inc. Overlapping variation in both Temp and Oil Insulation are used in explaining the variation in Oil but NOT in the Insulation estimation of nor 16

Coefficient of Multiple Determination n Proportion of total variation in Y explained by all

Coefficient of Multiple Determination n Proportion of total variation in Y explained by all X variables taken together n n Never decreases when a new X variable is added to model n Disadvantage when comparing models © 2002 Prentice-Hall, Inc. 17

Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation © 2002 Prentice-Hall, Inc.

Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation © 2002 Prentice-Hall, Inc. 18

Adjusted Coefficient of Multiple Determination n Proportion of variation in Y explained by all

Adjusted Coefficient of Multiple Determination n Proportion of variation in Y explained by all X variables adjusted for the number of X variables used n n Penalize excessive use of independent variables Smaller than Useful in comparing among models © 2002 Prentice-Hall, Inc. 19

Coefficient of Multiple Determination Excel Output Adjusted r 2 q reflects the number of

Coefficient of Multiple Determination Excel Output Adjusted r 2 q reflects the number of explanatory variables and sample size q is smaller than r 2 © 2002 Prentice-Hall, Inc. 20

Interpretation of Coefficient of Multiple Determination n n 96. 56% of the total variation

Interpretation of Coefficient of Multiple Determination n n 96. 56% of the total variation in heating oil can be explained by different temperature and amount of insulation n n 95. 99% of the total fluctuation in heating oil can be explained by different temperature and amount of insulation after adjusting for the number of explanatory variables and sample size © 2002 Prentice-Hall, Inc. 21

Using The Model to Make Predictions Predict the amount of heating oil used for

Using The Model to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is six inches. The predicted heating oil used is 278. 97 gallons © 2002 Prentice-Hall, Inc. 22

Predictions in PHStat | regression | multiple regression … n n Check the “confidence

Predictions in PHStat | regression | multiple regression … n n Check the “confidence and prediction interval estimate” box EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc. 23

Residual Plots n Residuals vs. n n May need to transform variable Residuals vs.

Residual Plots n Residuals vs. n n May need to transform variable Residuals vs. n n May need to transform Y variable May need to transform variable Residuals vs. time n May have autocorrelation © 2002 Prentice-Hall, Inc. 24

Residual Plots: Example Maybe some nonlinear relationship No Discernable Pattern © 2002 Prentice-Hall, Inc.

Residual Plots: Example Maybe some nonlinear relationship No Discernable Pattern © 2002 Prentice-Hall, Inc. 25

Influence Analysis n n n To determine observations that have influential effect on the

Influence Analysis n n n To determine observations that have influential effect on the fitted model Potentially influential points become candidate for removal from the model Criteria used are n n The hat matrix elements hi The Studentized deleted residuals ti* Cook’s distance statistic Di All three criteria are complementary n Only when all three criteria provide consistent result should an observation be removed © 2002 Prentice-Hall, Inc. 26

The Hat Matrix Element hi n n , Xi is an influential point If

The Hat Matrix Element hi n n , Xi is an influential point If n Xi may be considered a candidate for removal from the model © 2002 Prentice-Hall, Inc. 27

The Hat Matrix Element hi : Heating Oil Example § No hi > 0.

The Hat Matrix Element hi : Heating Oil Example § No hi > 0. 4 § No observation appears to be candidate for removal from the model © 2002 Prentice-Hall, Inc. 28

The Studentized Deleted * Residuals ti n n : difference between the observed and

The Studentized Deleted * Residuals ti n n : difference between the observed and predicted based on a model that includes all observations except observation i : standard error of the estimate for a model that includes all observations except observation i An observation is considered influential if n © 2002 Prentice-Hall, Inc. is the critical value of a two-tail test at 10% level of significance 29

The Studentized Deleted * Residuals ti : Example § t 10* and t 13*

The Studentized Deleted * Residuals ti : Example § t 10* and t 13* are influential points for potential removal from the model © 2002 Prentice-Hall, Inc. 30

Cook’s Distance Statistic Di n is the Studentized residual n n If influential n

Cook’s Distance Statistic Di n is the Studentized residual n n If influential n © 2002 Prentice-Hall, Inc. , an observation is considered is the critical value of the F distribution at a 50% level of significance 31

Cook’s Distance Statistic Di : Heating Oil Example § No Di > 0. 835

Cook’s Distance Statistic Di : Heating Oil Example § No Di > 0. 835 § No observation appears to be candidate for removal from the model Using the three criteria, there is insufficient evidence for the removal of any observation from the model © 2002 Prentice-Hall, Inc. 32

Testing for Overall Significance n n n Shows if there is a linear relationship

Testing for Overall Significance n n n Shows if there is a linear relationship between all of the X variables together and Y Use F test statistic Hypotheses: n n H 0: 1 = 2 = … = k = 0 (no linear relationship) H 1: at least one i 0 ( at least one independent variable affects Y ) The null hypothesis is a very strong statement Almost always reject the null hypothesis © 2002 Prentice-Hall, Inc. 33

Testing for Overall Significance (continued) n Test statistic: n n Where F has p

Testing for Overall Significance (continued) n Test statistic: n n Where F has p numerator and (n-p-1) denominator degrees of freedom © 2002 Prentice-Hall, Inc. 34

Test for Overall Significance Excel Output: Example p = 2, the number of explanatory

Test for Overall Significance Excel Output: Example p = 2, the number of explanatory variables © 2002 Prentice-Hall, Inc. p value n-1 35

Test for Overall Significance Example Solution H 0 : 1 = 2 = …

Test for Overall Significance Example Solution H 0 : 1 = 2 = … = p = 0 H 1: At least one i 0 =. 05 df = 2 and 12 Test Statistic: F 168. 47 (Excel Output) Decision: Reject at = 0. 05 Critical Value(s): Conclusion: = 0. 05 0 © 2002 Prentice-Hall, Inc. 3. 89 F There is evidence that at least one independent variable affects Y 36

Test for Significance: Individual Variables n n n Shows if there is a linear

Test for Significance: Individual Variables n n n Shows if there is a linear relationship between the variable Xi and Y Use t test statistic Hypotheses: n n H 0: i = 0 (no linear relationship) H 1: i 0 (linear relationship between Xi and Y) © 2002 Prentice-Hall, Inc. 37

t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t

t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation) © 2002 Prentice-Hall, Inc. 38

t Test : Example Solution Does temperature have a significant effect on monthly consumption

t Test : Example Solution Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0. 05. H 0: 1 = 0 Test Statistic: H 1: 1 0 df = 12 t Test Statistic = -16. 1699 Decision: Reject H 0 at = 0. 05 Critical Value(s): Reject H 0 . 025 -2. 1788 © 2002 Prentice-Hall, Inc. 0 2. 1788 t Conclusion: There is evidence of a significant effect of temperature on oil consumption. 39

Venn Diagrams and Estimation of Regression Model Only this information is used in the

Venn Diagrams and Estimation of Regression Model Only this information is used in the estimation of Oil Only this information is used in the estimation of Temp Insulation © 2002 Prentice-Hall, Inc. This information is NOT used in the estimation of nor 40

Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population

Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption). -6. 169 1 -4. 704 The estimated average consumption of oil is reduced by between 4. 7 gallons to 6. 17 gallons per each increase of 10 F. © 2002 Prentice-Hall, Inc. 41

Contribution of a Single Independent Variable n Let Xk be the independent variable of

Contribution of a Single Independent Variable n Let Xk be the independent variable of interest n n Measures the contribution of Xk in explaining the total variation in Y (SST) © 2002 Prentice-Hall, Inc. 42

Contribution of a Single Independent Variable From ANOVA section of regression for Measures the

Contribution of a Single Independent Variable From ANOVA section of regression for Measures the contribution of © 2002 Prentice-Hall, Inc. From ANOVA section of regression for in explaining SST 43

Coefficient of Partial Determination of n n Measures the proportion of variation in the

Coefficient of Partial Determination of n n Measures the proportion of variation in the dependent variable that is explained by Xk while controlling for (holding constant) the other independent variables © 2002 Prentice-Hall, Inc. 44

Coefficient of Partial Determination for (continued) Example: Two Independent Variable Model © 2002 Prentice-Hall,

Coefficient of Partial Determination for (continued) Example: Two Independent Variable Model © 2002 Prentice-Hall, Inc. 45

Venn Diagrams and Coefficient of Partial Determination for Oil = Temp Insulation © 2002

Venn Diagrams and Coefficient of Partial Determination for Oil = Temp Insulation © 2002 Prentice-Hall, Inc. 46

Coefficient of Partial Determination in PHStat | regression | multiple regression … n n

Coefficient of Partial Determination in PHStat | regression | multiple regression … n n Check the “coefficient of partial determination” box EXCEL spreadsheet for the heating oil example © 2002 Prentice-Hall, Inc. 47

Contribution of a Subset of Independent Variables n Let Xs be the subset of

Contribution of a Subset of Independent Variables n Let Xs be the subset of independent variables of interest n n Measures the contribution of the subset xs in explaining SST © 2002 Prentice-Hall, Inc. 48

Contribution of a Subset of Independent Variables: Example Let Xs be X 1 and

Contribution of a Subset of Independent Variables: Example Let Xs be X 1 and X 3 From ANOVA section of regression for © 2002 Prentice-Hall, Inc. From ANOVA section of regression for 49

Testing Portions of Model n n Examines the contribution of a subset Xs of

Testing Portions of Model n n Examines the contribution of a subset Xs of explanatory variables to the relationship with Y Null hypothesis: n n Variables in the subset do not improve significantly the model when all other variables are included Alternative hypothesis: n At least one variable is significant © 2002 Prentice-Hall, Inc. 50

Testing Portions of Model (continued) n n Always one-tailed rejection region Requires comparison of

Testing Portions of Model (continued) n n Always one-tailed rejection region Requires comparison of two regressions n n One regression includes everything Another regression includes everything except the portion to be tested © 2002 Prentice-Hall, Inc. 51

Partial F Test For Contribution of Subset of X variables n Hypotheses: n n

Partial F Test For Contribution of Subset of X variables n Hypotheses: n n n H 0 : Variables Xs do not significantly improve the model given all others variables included H 1 : Variables Xs significantly improve the model given all others included Test Statistic: n n n with df = m and (n-p-1) m = # of variables in the subset Xs © 2002 Prentice-Hall, Inc. 52

Partial F Test For Contribution of A Single n Hypotheses: n n n H

Partial F Test For Contribution of A Single n Hypotheses: n n n H 0 : Variable Xj does not significantly improve the model given all others included H 1 : Variable Xj significantly improves the model given all others included Test Statistic: n n n With df = 1 and (n-p-1) m = 1 here © 2002 Prentice-Hall, Inc. 53

Testing Portions of Model: Example Test at the =. 05 level to determine whether

Testing Portions of Model: Example Test at the =. 05 level to determine whether the variable of average temperature significantly improves the model given that insulation is included. © 2002 Prentice-Hall, Inc. 54

Testing Portions of Model: Example H 0: X 1 (temperature) does not improve model

Testing Portions of Model: Example H 0: X 1 (temperature) does not improve model with X 2 (insulation) included =. 05, df = 1 and 12 Critical Value = 4. 75 H 1: X 1 does improve model (For X 1 and X 2) © 2002 Prentice-Hall, Inc. (For X 2) Conclusion: Reject H 0; X 1 does improve model 55

Testing Portions of Model in PHStat | regression | multiple regression … n n

Testing Portions of Model in PHStat | regression | multiple regression … n n Check the “coefficient of partial determination” box EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc. 56

Do We Need to Do this for One Variable? n n © 2002 Prentice-Hall,

Do We Need to Do this for One Variable? n n © 2002 Prentice-Hall, Inc. The F test for the inclusion of a single variable after all other variables are included in the model is IDENTICAL to the t test of the slope for that variable The only reason to do an F test is to test several variables together 57

The Quadratic Regression Model n n n Relationship between one response variable and two

The Quadratic Regression Model n n n Relationship between one response variable and two or more explanatory variables is a quadratic polynomial function Useful when scatter diagram indicates nonlinear relationship Quadratic model : n n The second explanatory variable is the square of the first variable © 2002 Prentice-Hall, Inc. 58

Quadratic Regression Model (continued) Quadratic models may be considered when scatter diagram takes on

Quadratic Regression Model (continued) Quadratic models may be considered when scatter diagram takes on the following shapes: Y Y 2 > 0 X 1 Y 2 < 0 X 1 2 = the coefficient of the quadratic term © 2002 Prentice-Hall, Inc. 59

Testing for Significance: Quadratic Model n Testing for Overall Relationship n n n Similar

Testing for Significance: Quadratic Model n Testing for Overall Relationship n n n Similar to test for linear model F test statistic = Testing the Quadratic Effect n Compare quadratic model with the linear model n Hypotheses n n © 2002 Prentice-Hall, Inc. (No 2 nd order polynomial term) (2 nd order polynomial term is needed) 60

Heating Oil Example Determine whether a quadratic model is needed for estimating heating oil

Heating Oil Example Determine whether a quadratic model is needed for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches. © 2002 Prentice-Hall, Inc. (0 F) 61

Heating Oil Example: Residual Analysis (continued) Maybe some nonlinear relationship No Discernable Pattern ©

Heating Oil Example: Residual Analysis (continued) Maybe some nonlinear relationship No Discernable Pattern © 2002 Prentice-Hall, Inc. 62

Heating Oil Example: t Test for Quadratic Model (continued) n Testing the quadratic effect

Heating Oil Example: t Test for Quadratic Model (continued) n Testing the quadratic effect n Compare quadratic model in insulation With the linear model n Hypotheses n n © 2002 Prentice-Hall, Inc. (No quadratic term in insulation) (Quadratic term is needed in insulation) 63

Example Solution Is quadratic model in insulation needed on monthly consumption of heating oil?

Example Solution Is quadratic model in insulation needed on monthly consumption of heating oil? Test at = 0. 05. H 0: 3 = 0 Test Statistic: H 1: 3 0 df = 11 t Test Statistic = 1. 6611 Decision: Do not reject H 0 at = 0. 05 Critical Value(s): Reject H 0 . 025 -2. 2010 © 2002 Prentice-Hall, Inc. 0 2. 2010 Z Conclusion: There is not sufficient evidence for the need to include quadratic effect of insulation on oil consumption. 64

Example Solution in PHStat | regression | multiple regression … n EXCEL spreadsheet for

Example Solution in PHStat | regression | multiple regression … n EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc. 65

Dummy Variable Models n n n n Categorical explanatory variable (dummy variable) with two

Dummy Variable Models n n n n Categorical explanatory variable (dummy variable) with two or more levels: Yes or no, on or off, male or female, Coded as 0 or 1 Only intercepts are different Assumes equal slopes across categories The number of dummy variables needed is (number of levels - 1) Regression model has same form: © 2002 Prentice-Hall, Inc. 66

Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value of House X 1

Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value of House X 1 = Square footage of House X 2 = Desirability of Neighborhood = Desirable (X 2 = 1) Undesirable (X 2 = 0) © 2002 Prentice-Hall, Inc. 0 if undesirable 1 if desirable Same slopes 67

Dummy-Variable Models (with 2 Levels) (continued) Y (Assessed Value) n e l b a

Dummy-Variable Models (with 2 Levels) (continued) Y (Assessed Value) n e l b a r i Des b 0 + b 2 Intercepts different b 0 tio a c o L Same slopes le b a r i es Und X 1 (Square footage) © 2002 Prentice-Hall, Inc. 68

Interpretation of the Dummy Variable Coefficient (with 2 Levels) Example: : Annual salary of

Interpretation of the Dummy Variable Coefficient (with 2 Levels) Example: : Annual salary of college graduate in thousand $ : GPA : 0 Female 1 Male On average, male college graduates are making an estimated six thousand dollars more than female college graduates with the same GPA. © 2002 Prentice-Hall, Inc. 69

Dummy-Variable Models (with 3 Levels) © 2002 Prentice-Hall, Inc. 70

Dummy-Variable Models (with 3 Levels) © 2002 Prentice-Hall, Inc. 70

Interpretation of the Dummy Variable Coefficients (with 3 Levels) With the same footage, a

Interpretation of the Dummy Variable Coefficients (with 3 Levels) With the same footage, a Splitlevel will have an estimated average assessed value of 18. 84 thousand dollars more than a Condo. With the same footage, a Ranch will have an estimated average assessed value of 23. 53 thousand dollars more than a Condo. © 2002 Prentice-Hall, Inc. 71

Interaction Regression Model n Hypothesizes interaction between pairs of X variables n n Response

Interaction Regression Model n Hypothesizes interaction between pairs of X variables n n Response to one X variable varies at different levels of another X variable Contains two-way cross product terms n n Can be combined with other models n E. G. , Dummy variable model © 2002 Prentice-Hall, Inc. 72

Effect of Interaction n Given: n n Without interaction term, effect of X 1

Effect of Interaction n Given: n n Without interaction term, effect of X 1 on Y is measured by 1 With interaction term, effect of X 1 on Y is measured by 1 + 3 X 2 Effect changes as X 2 increases © 2002 Prentice-Hall, Inc. 73

Interaction Example Y Y = 1 + 2 X 1 + 3 X 2

Interaction Example Y Y = 1 + 2 X 1 + 3 X 2 + 4 X 1 X 2 Y = 1 + 2 X 1 + 3(1) + 4 X 1(1) = 4 + 6 X 1 12 8 Y = 1 + 2 X 1 + 3(0) + 4 X 1(0) = 1 + 2 X 1 4 0 0 0. 5 1 1. 5 X 1 Effect (slope) of X 1 on Y does depend on X 2 value © 2002 Prentice-Hall, Inc. 74

Interaction Regression Model Worksheet Multiply X 1 by X 2 to get X 1

Interaction Regression Model Worksheet Multiply X 1 by X 2 to get X 1 X 2. Run regression with Y, X 1, X 2 , X 1 X 2 © 2002 Prentice-Hall, Inc. 75

Interpretation when there are more than Three Levels MALE = 0 if female and

Interpretation when there are more than Three Levels MALE = 0 if female and 1 if male MARRIED = 1 if married; 0 if not DIVORCED = 1 if divorced; 0 if not MALE • MARRIED = 1 if male married; 0 otherwise = (MALE times MARRIED) MALE • DIVORCED = 1 if male divorced; 0 otherwise = (MALE times DIVORCED) © 2002 Prentice-Hall, Inc. 76

Interpretation when there are more than Three Levels (continued) © 2002 Prentice-Hall, Inc. 77

Interpretation when there are more than Three Levels (continued) © 2002 Prentice-Hall, Inc. 77

Interpreting Results FEMALE Single: Married: Divorced: Difference Main Effects : MALE, MARRIED and DIVORCED

Interpreting Results FEMALE Single: Married: Divorced: Difference Main Effects : MALE, MARRIED and DIVORCED Interaction Effects : MALE • MARRIED and MALE • DIVORCED © 2002 Prentice-Hall, Inc. 78

Evaluating Presence of Interaction n Hypothesize interaction between pairs of independent variables Contains 2

Evaluating Presence of Interaction n Hypothesize interaction between pairs of independent variables Contains 2 -way product terms n Hypotheses: n n n H 0: 3 = 0 (no interaction between X 1 and X 2) H 1: 3 0 (X 1 interacts with X 2) © 2002 Prentice-Hall, Inc. 79

Using Transformations n n n Requires data transformation Either or both independent and dependent

Using Transformations n n n Requires data transformation Either or both independent and dependent variables may be transformed Can be based on theory, logic or scatter diagrams © 2002 Prentice-Hall, Inc. 80

Inherently Linear Models n Non-linear models that can be expressed in linear form n

Inherently Linear Models n Non-linear models that can be expressed in linear form n n Can be estimated by least squares in linear form Require data transformation © 2002 Prentice-Hall, Inc. 81

Transformed Multiplicative Model (Log-Log) © 2002 Prentice-Hall, Inc. Similarly for X 2 82

Transformed Multiplicative Model (Log-Log) © 2002 Prentice-Hall, Inc. Similarly for X 2 82

Square Root Transformation 1 > 0 Similarly for X 2 1 < 0 Transforms

Square Root Transformation 1 > 0 Similarly for X 2 1 < 0 Transforms one of above model to one that appears linear. Often used to overcome heteroscedasticity. © 2002 Prentice-Hall, Inc. 83

Linear-Logarithmic Transformation 1 > 0 Similarly for X 2 1 < 0 Transformed from

Linear-Logarithmic Transformation 1 > 0 Similarly for X 2 1 < 0 Transformed from an original multiplicative model © 2002 Prentice-Hall, Inc. 84

Exponential Transformation (Log-Linear) Original Model 1 > 0 1 < 0 Transformed Into: ©

Exponential Transformation (Log-Linear) Original Model 1 > 0 1 < 0 Transformed Into: © 2002 Prentice-Hall, Inc. 85

Interpretation of Coefficients n The dependent variable is logged n n The coefficient of

Interpretation of Coefficients n The dependent variable is logged n n The coefficient of the independent variable can be approximately interpreted as: a 1 unit change in leads to an estimated percentage change in the average of Y The independent variable is logged n The coefficient of the independent variable can be approximately interpreted as: a 100 percent change in leads to an estimated unit change in the average of Y © 2002 Prentice-Hall, Inc. 86

Interpretation of coefficients (continued) n Both dependent and independent variables are logged n The

Interpretation of coefficients (continued) n Both dependent and independent variables are logged n The coefficient of the independent variable can be approximately interpreted as : a 1 percent change in leads to an estimated percentage change in the average of Y. Therefore is the elasticity of Y with respect to a change in © 2002 Prentice-Hall, Inc. 87

Interpretation of Coefficients (continued) n If both Y and are measured in standardized form:

Interpretation of Coefficients (continued) n If both Y and are measured in standardized form: And n n The n are called standardized coefficients They indicate the estimated number of average standard deviations Y will change when changes by one standard deviation © 2002 Prentice-Hall, Inc. 88

Collinearity (Multicollinearity) n n High correlation between explanatory variables Coefficient of multiple determination measures

Collinearity (Multicollinearity) n n High correlation between explanatory variables Coefficient of multiple determination measures combined effect of the correlated explanatory variables No new information provided Leads to unstable coefficients (large standard error) n Depending on the explanatory variables © 2002 Prentice-Hall, Inc. 89

Venn Diagrams and Collinearity Large Overlap reflects collinearity between Temp and Insulation Oil Large

Venn Diagrams and Collinearity Large Overlap reflects collinearity between Temp and Insulation Oil Large Overlap in variation of Temp and Insulation is used in explaining the variation in Oil but NOT in estimating and Temp Insulation © 2002 Prentice-Hall, Inc. 90

Detect Collinearity (Variance Inflationary Factor) § § If Used to Measure Collinearity is Highly

Detect Collinearity (Variance Inflationary Factor) § § If Used to Measure Collinearity is Highly Correlated with the Other Explanatory Variables. © 2002 Prentice-Hall, Inc. 91

Detect Collinearity in PHStat | regression | multiple regression … n n Check the

Detect Collinearity in PHStat | regression | multiple regression … n n Check the “variance inflationary factor (VIF)” box EXCEL spreadsheet for the heating oil example n Since there are only two explanatory variables, only one VIF is reported in the excel spreadsheet n No VIF is > 5 n © 2002 Prentice-Hall, Inc. There is no evidence of collinearity 92

Model Building n Goal is to develop a good model with the fewest explanatory

Model Building n Goal is to develop a good model with the fewest explanatory variables n n n Stepwise regression procedure n n Easier to interpret Lower probability of collinearity Provide limited evaluation of alternative models Best-subset approach n n Uses the cp statistic Selects model with small cp near p+1 © 2002 Prentice-Hall, Inc. 93

Model Building Flowchart Choose X 1, X 2, …Xp Run Regression to find VIFs

Model Building Flowchart Choose X 1, X 2, …Xp Run Regression to find VIFs Any VIF>5? Yes Remove Variable with Yes Highest VIF More than One? No Remove this X © 2002 Prentice-Hall, Inc. No Run Subsets Regression to Obtain “best” models in terms of Cp Do Complete Analysis Add Curvilinear Term and/or Transform Variables as Indicated Perform Predictions 94

Pitfalls and Ethical Considerations To avoid pitfalls and address ethical considerations: n n n

Pitfalls and Ethical Considerations To avoid pitfalls and address ethical considerations: n n n Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant Evaluate residual plots for each independent variable Evaluate interaction terms © 2002 Prentice-Hall, Inc. 95

Additional Pitfalls and Ethical Considerations (continued) To avoid pitfalls and address ethical considerations: n

Additional Pitfalls and Ethical Considerations (continued) To avoid pitfalls and address ethical considerations: n Obtain VIF for each independent variable and remove variables that exhibit a high collinearity with other independent variables before performing significance test on each independent variable n Examine several alternative models using bestsubsets regression n Use other methods when the assumptions necessary for least-squares regression have been seriously violated © 2002 Prentice-Hall, Inc. 96

Chapter Summary n n n Developed the multiple regression model Discussed residual plots Addressed

Chapter Summary n n n Developed the multiple regression model Discussed residual plots Addressed testing the significance of the multiple regression model Discussed inferences on population regression coefficients Addressed testing portion of the multiple regression model © 2002 Prentice-Hall, Inc. 97

Chapter Summary n n n (continued) Described the quadratic regression model Addressed dummy variables

Chapter Summary n n n (continued) Described the quadratic regression model Addressed dummy variables Discussed using transformation in regression models Described collinearity Discussed model building Addressed pitfalls in multiple regression and ethical considerations © 2002 Prentice-Hall, Inc. 98