Multivariate Regression Topics 1 2 3 4 The

  • Slides: 24
Download presentation
Multivariate Regression

Multivariate Regression

Topics 1. 2. 3. 4. ¢ ¢ The form of the equation Assumptions Axis

Topics 1. 2. 3. 4. ¢ ¢ The form of the equation Assumptions Axis of evil (collinearity, heteroscedasticity and autocorrelation) Model miss-specification Missing a critical variable Including irrelevant variable (s)

The form of the equation Yt= Dependent variable a 1 = Intercept b 2=

The form of the equation Yt= Dependent variable a 1 = Intercept b 2= Constant (partial regression coefficient) b 3= Constant (partial regression coefficient) X 2 = Explanatory variable X 3 = Explanatory variable et = Error term

Partial Correlation (slope) Coefficients B 2 measures the change in the mean value of

Partial Correlation (slope) Coefficients B 2 measures the change in the mean value of Y per unit change in X 2, while holding the value of X 3 constant. (Known in calculus as a partial derivative) Y = a +b. X dy = b

Assumptions of MVR ¢ ¢ ¢ X 2 and X 3 are non-stochastic, that

Assumptions of MVR ¢ ¢ ¢ X 2 and X 3 are non-stochastic, that is, their values are fixed in repeated sampling The error term e has a zero mean value (Σe/N=0) Homoscedasticity, that is the variance of “e”, is constant. No autocorrelation exists between the error term and the explanatory variable. No exact collinearity exist between X 2 and X 3 The error term “e” follows the normal distribution with mean zero and constant variance

Venn Diagram: Correlation & Coefficients of Determination (R 2) Y Y X 1 X

Venn Diagram: Correlation & Coefficients of Determination (R 2) Y Y X 1 X 2 No correlation exists between X 1 and X 2. Each variable explains a portion of the variation of Y X 1 X 2 Correlation exists between X 1 and X 2. There is a portion of the variation of Y that can be attributed to either one

A special case: Perfect Collinearity Y X 1 X 2 is a perfect function

A special case: Perfect Collinearity Y X 1 X 2 is a perfect function of X 1. Therefore, including X 2 would be irrelevant because does not explain any of the variation on Y that is already accounted by X 1. The model will not run.

Consequences of Collinearity Multicollinearity is related to sample-specific issues ¢ ¢ ¢ ¢ Large

Consequences of Collinearity Multicollinearity is related to sample-specific issues ¢ ¢ ¢ ¢ Large variance and standard error of OLS estimators Wider confidence intervals Insignificant t ratios A high R 2 but few significant t ratios OLS estimators and their standard error are very sensitive to small changes in the data; they tend to be unstable Wrong signs of regression coefficients Difficult to determine the contribution of explanatory variables to the R 2

TESTING FOR MULTICOLLINARITY

TESTING FOR MULTICOLLINARITY

DEPENDENT TLA BATHS BEDROOM AGE

DEPENDENT TLA BATHS BEDROOM AGE

IS BAD IF WE HAVE MULTICOLLINEARITY? If the goal of the study is to

IS BAD IF WE HAVE MULTICOLLINEARITY? If the goal of the study is to use the model to predict or forecast the future mean value of the dependent variable, collinearity may not be a problem ¢ If the goal of the study is not prediction but reliable estimation of the parameters then collinearity is a serious problem ¢ Solutions: Dropping variables, acquire more data or a new sample, rethinking the model or transform the form of the variables. ¢

Heteroscedasticity ¢ Heteroscedasticity: The variance of “e” is not constant, therefore, violates the assumption

Heteroscedasticity ¢ Heteroscedasticity: The variance of “e” is not constant, therefore, violates the assumption of hemoscedasticity or equal variance.

Heteroscedasticity

Heteroscedasticity

What to do when the pattern is not clear ? ¢ Run a regression

What to do when the pattern is not clear ? ¢ Run a regression where you regress the residuals or error term on Y.

LET’S ESTIMATE HETEROSCEDASTICITY Do a regression where the residuals become the dependent Variable and

LET’S ESTIMATE HETEROSCEDASTICITY Do a regression where the residuals become the dependent Variable and home value the independent variable.

Consequences of Heteroscedasticity 1. 2. 3. 4. 5. OLS estimators are still linear OLS

Consequences of Heteroscedasticity 1. 2. 3. 4. 5. OLS estimators are still linear OLS estimators are still unbiased But they no longer have minimum variance. They are not longer BLUE Therefore we run the risk of drawing wrong conclusions when doing hypothesis testing (Ho: b=0) Solutions: variable transformation, develop a new model that takes into account no linearity (logarithmic function).

Testing for Heteroscedasticity Let’s regress the predicted value (Y hat) on the log of

Testing for Heteroscedasticity Let’s regress the predicted value (Y hat) on the log of the residual (log e 2) to see the pattern of heteroscedasticity. Log e 2 The above pattern shows that our relationships is best described as a Logarithmic function

Autocorrelation ¢ ¢ ¢ Time-series correlation: The best predictor of sales for the present

Autocorrelation ¢ ¢ ¢ Time-series correlation: The best predictor of sales for the present Christmas season is the previous Christmas season Spatial correlation: The best predictor of a home’s value is the value of a home next door or in the same area or neighborhood. The best predictor for a politician, to win an election as an incumbent, is the previous election (ceteris paribus)

Autocorrelation ¢ Gujarati defines autocorrelation as “correlation between members of observations ordered in time

Autocorrelation ¢ Gujarati defines autocorrelation as “correlation between members of observations ordered in time [as time- series data] or space as [in cross-sectional data]. E (Ui. Uj)=0 The product of two different error terms Ui and Uj is zero. ¢ Autocorrelation is a model specification error or the regression model is not specified correctly. A variable is missing or has the wrong functional form.

Types of Autocorrelation

Types of Autocorrelation

The Durbin Watson Test (d) of Autocorrelation Values of the d d = 4

The Durbin Watson Test (d) of Autocorrelation Values of the d d = 4 (perfect negative correlation d = 2 (no autocorrelation) d = 0 (perfect positive correlation)

Let’s do a “d” test Here we solved the problem of collinearity, heteroscedasticity and

Let’s do a “d” test Here we solved the problem of collinearity, heteroscedasticity and autocorrelation. It cannot get any better than this.

Model Miss-specification ¢ 1. 2. 3. 4. 5. Omitted variable bias or underfitting a

Model Miss-specification ¢ 1. 2. 3. 4. 5. Omitted variable bias or underfitting a model. Therefore The omitted variable is correlated with the included variable then the parameters estimated are bias, that is their expected values do not match the true value The error variance estimated is bias The confidence intervals and hypothesis-testing procedures and unreliable. The R 2 is also unreliable Let’s run a model Ln. VAL = a + b. LNTLA + b. LNBDR + b. LNAGE (true model) Ln. VAL=a +b. LNBDR + LNAGE + e (underfitted)

Model Miss-specification ¢ 1. 2. 3. 4. Irrelevant variable bias The unnecessary variables has

Model Miss-specification ¢ 1. 2. 3. 4. Irrelevant variable bias The unnecessary variables has not effect on Y (although R 2 may increase). The model still give us unbias and consistent estimates of the coefficients The major penalty is that the true parameters are less precise therefore the CI are wider increasing the risk of drawing invalid inference during hypothesis testing (accept the Ho: B=0) Let’s run the following model: LNVALUE=a + b. LNTLA+ b. LNBTH + b. LNBDR + b. LNAGE