OMITTED VARIABLE BIAS Omitted variable bias Fundamentals of

OMITTED VARIABLE BIAS Omitted variable bias Fundamentals of PROGRAM EVALUATION JESSE LECY

OMITTED VARIABLE BIAS 2 A note on terms in this section: “Full Model”, i. e. the “truth”. The slopes will be correct because we have all of the variables included, therefore we use Greek letters. SES “Naive Model” - We are missing variables and therefore we do NOT know if the slopes are correct. They represent our best guess. They may contain bias. We use Latin characters to denote this. You might be used to thinking in terms of population statistics and sample. In regressions, you can have the entire population in your sample, but if you are missing variables in your regression then your slopes will be wrong. To map concepts, when I say “full model” think population statistic (the truth), and when I say “naïve model” think sample statistic (the best guess).

OMITTED VARIABLE BIAS 3 The Main Question to ask yourself: ( full model ) ( naïve model ) Does omitting a variable introduce bias into our estimate of program impact? If we have an omitted variable, will our estimate of the program impact (b 1) sufficiently represent the true program impact (β 1)?

OMITTED VARIABLE BIAS Note! We will ALWAYS have omitted variables in observational studies because we either can’t measure the variable we care about, or else it’s just not available in the data we have available. The real question is not whether it is there, but how much will it affect our estimates? 4

OMITTED VARIABLE BIAS What We Know So Far: We think about control variables as variables that remove variance from our model so we can focus on the policy variable. Test Score Teacher Quality A a b B C Class Size c Class Size Socio-Econ. Status 5

OMITTED VARIABLE BIAS What We Know So Far: 6

OMITTED VARIABLE BIAS What We Know So Far: Test Score Teacher Quality A a B B C Class Size When we add a control that is uncorrelated with the policy variable, it explains extra variance of Y but does not affect the policy slope. 7

OMITTED VARIABLE BIAS 8 What We Know So Far: Test Score A a b B C Class Size c Socio-Econ. Status Class Size When we add a control variable that IS correlated with the policy variable it affects both the slope and the standard error.

OMITTED VARIABLE BIAS Omitted Variable Bias: All that we are doing with omitted variable bias is asking, what happens when we leave the control variable out of the model? 9

OMITTED VARIABLE BIAS Case #1 Y X 1 X 2 (omitted variable) Since the omitted variable X 2 is uncorrelated with the policy variable X 1, then leaving it out does not change the slope b 1. There is no bias. 10

OMITTED VARIABLE BIAS 11 Case #2 Y X 1 X 2 (omitted variable) In this case, omitting X 2 from the model will change the slope b 1 because X 1 and X 2 have shared covariance. Our naïve estimate WILL be biased.

OMITTED VARIABLE BIAS 12 How do omitted variable impact regression results? SES & TQ omitted SES omitted TQ omitted Full Model Bias is the difference between the “truth” (Model 5 in this case) and what we would get if we ran a naïve regression (Model 1 here). Note that the bias can be quite large. We overestimate the impact of our program by 51% !

OMITTED VARIABLE BIAS Calculating Omitted Variable Bias: The definition of bias is the difference between the true slope and our best guess of the slope: Note that this is not very useful in practice because if you know the true slope β 1 you will not need to calculate bias! 13

OMITTED VARIABLE BIAS 14 Where Bias Comes From: Y X 1 Y X 2 Direct Effect Indirect Effect X 1 b 1 = Direct Effect + Indirect Effect β 1 = Direct Effect bias = b 1 – β 1 = Indirect Effect X 2

OMITTED VARIABLE BIAS 15 The Math: Y β 1 X 1 α 2 Y X 2 β 2 Direct Effect Indirect Effect X 1 (full regression) (auxiliary regression) (path diagram for X 1 X 2 Y ) X 2

OMITTED VARIABLE BIAS 16 The Math: Y True Slope plus Bias X 1 Y X 2 X 1 If we run a naïve model and exclude X 2 then the slope b 1 will include both the direct and indirect effects.

OMITTED VARIABLE BIAS Note: To run the auxiliary regression, just think about the effects of X 1 working through X 2, so make sure X 2 is on the left hand side of the auxiliary regression. X 1 α 1 β 1 X 2 Y β 2 17

OMITTED VARIABLE BIAS Omitted Variable Bias Derived (Don’t need to know for the test)

OMITTED VARIABLE BIAS Example of Calculations: (naïve regression) (full regression) (auxiliary regression) X 1 α 1 β 1 X 2 Y β 2 19

OMITTED VARIABLE BIAS 20 The take-away: (1) Bias is the product of two slopes: X 1→X 2 & X 2 →Y (2) The naïve slope is the actual slope plus bias (3) The sign of a slope is always determined by the sign of the covariance, i. e. the correlation

OMITTED VARIABLE BIAS 21 Why does this matter? Case 1: Naïve slope is too large Case 2: Naïve slope is too small naïve slope Case 2 Case 1 If the naïve slope is too large is can make it look significant when it’s not

OMITTED VARIABLE BIAS WHEN DOES O. V. B. OCCUR? 22

OMITTED VARIABLE BIAS Case 1: Omitted variable In this case, the omitted variable X 2 is correlated with the policy variable X 1. There is shared co-variance, represented by the region B. This is the region that is discarded as part of the regression procedure 23 correlated with policy Y A X 1 B C X 2 The naïve slope, b 1, and the fullmodel slope, B 1, will now be different because of the exclusion of the region B. The naïve model will be biased as a result of omitting X 2.

OMITTED VARIABLE BIAS Case 1: Omitted variable correlated with policy Y A X 1 B C X 2 Y X 1 Path Diagram X 2 24

OMITTED VARIABLE BIAS Case 2: Omitted variable In this case, the omitted variable X 2 is uncorrelated with the policy variable X 1. There is no overlap in the Venn Diagram. 25 uncorrelated with policy Y X 1 A C X 2 Since the naïve slope, b 1, and the full-model slope, B 1, are the same, there is no bias that results from omitting X 2.

OMITTED VARIABLE BIAS Case 2: Omitted variable uncorrelated with policy Y X 1 A C X 2 Y X 1 Path Diagram X 2 26

OMITTED VARIABLE BIAS EXAMPLE: OVB 27

OMITTED VARIABLE BIAS True Model: What happens when we omit MAT? We want to try to predict an individuals GPA in graduate school as a function of their GRE score, and how they do on a standardized math test.

OMITTED VARIABLE BIAS Fancy Descriptive Statistics 29

OMITTED VARIABLE BIAS Calculations Full Model Naïve Model Omit MAT Auxiliary Regression 30

OMITTED VARIABLE BIAS 31 Why we care: Results should to be: • Unbiased (accurate) • Efficient (precise) b 1 Unbiased and inefficient b 1 Biased and efficient Unbiased and efficient b 1 Biased and inefficient

OMITTED VARIABLE BIAS 32 Which scenario is each model? Model 1 Test Score CS Model 2 Test Score TQ CS Test Score CS Model 4 SES

OMITTED VARIABLE BIAS Model 1 Test Score CS Model 2 Biased and inefficient Test Score TQ Biased and efficient CS Unbiased and inefficient Model 4 Test Score SES CS TQ Test Score CS Model 5 SES Unbiased and efficient (not pictured) 33

OMITTED VARIABLE BIAS 34 PARTITIONED REGRESSION VS OMITTED VARIABLE BIAS

OMITTED VARIABLE BIAS Full model Y X 1 X 2 35

OMITTED VARIABLE BIAS Partitioned Regression Y e 1 X 2 e 2 X 1 X 2 36

OMITTED VARIABLE BIAS Partitioned Regression e 1 C A e 2 B 37

OMITTED VARIABLE BIAS 38 Example 1: “policy variable” Model 1 (Constant) Perceived Social Support Network Diversity # People in Social Network Coefficientsa Unstandardized Standardized Coefficients t Sig. B Std. Error Beta -1. 507 2. 633 -. 572. 567 . 310 . 032 . 460 9. 645. 000 -. 215. 023 . 210. 014 -. 055. 087 -1. 025. 306 1. 654. 099 Dependent Variable: Subjective Well Being

OMITTED VARIABLE BIAS 39 Auxiliary Regression #1 Model 1 (Constant) Network Diversity # People in Social Network Coefficientsa Unstandardized Standardized Coefficients t B Std. Error Beta 21. 630 1. 207 17. 91 5. 224. 228. 057. 979. 039. 016. 148 2. 540 Dependent Variable: Subjective Well Being Regress the DV on all IV’s except the policy variable. Save the residuals. Sig. . 000. 328. 011

OMITTED VARIABLE BIAS Auxiliary Regression #1 Coefficients a Standardized Unstandardized Coefficients Model 1 B (Constant) Std. Error 21. 630 1. 207 Network Diversity . 224 . 228 # People in Social . 039 . 016 Network a. Dependent Variable: Subjective Well Being Coefficients t Beta Sig. 17. 915 . 000 . 057 . 979 . 328 . 148 2. 540 . 011

OMITTED VARIABLE BIAS 41 Auxiliary Regression #2 Model 1 (Constant) Network Diversity # People in Social Network Coefficientsa Unstandardized Standardized Coefficients t Sig. B Std. Error Beta 74. 720 1. 720 43. 438. 000 1. 417. 052 . 325. 022 Dependent Variable: Perceived Social Support Regress the policy variable on all of the IV’s. Save the residuals. . 244 4. 356. 000. 133 2. 364. 019

OMITTED VARIABLE BIAS 42 Partitioned Regression Regress the residuals on each other. Model 1 (Constant) Unstandardized Residual Coefficientsa Unstandardized Standardized Coefficients B Std. Error Beta 2. 369 E-15. 253 . 310 . 032 . 441 t. 000 9. 670 Sig. 1. 000 Dependent Variable: Unstandardized Residual We have recovered the original slope!