Advanced Quantitative Techniques Lab 6 Continuation of linear
Advanced Quantitative Techniques Lab 6
• Continuation of linear regression topics – initial error checking (first 30 mins) • Assistance finding data, identifying variables for final project, projecting time needed to clean
Error term : lots to look at • Error in regression: essential, not just the last thing in the equation. • Tells us how good our estimate is • Tells us if we can use an OLS (simple linear regression) or maybe something else would fit better. • Is lowest values of squared distance between actual & predicted--- squared because we don’t want overestimates to cancel out under-estimates. Both deviations are important.
Bonus: ‘error’ vs ‘residual’ • Very similar, but technically: – Residual is difference between observed and expected/estimated value (e. g. a sample mean) = estimated. Y – observed. Y – Error is difference between observed and real value (e. g. a known population mean) =‘real’Y – observed. Y. Typically, we’ll use residuals (infrequently have the population mean). But lots of people mix up the terms.
Ordinary Least Square (OLS) Regression Assumptions. 3 of 4 of the big assumptions are about error term! 1. Linearity - the relationships between the predictors and the outcome variable should be linear 2. Normality - the errors should be normally distributed 3. Constant error variance (homoscedasticity) - the error variance should be constant 4. Independence - the errors associated with one observation are not correlated with the errors of any other observation Major issues to check for: 1. Influence - individual observations that exert undue influence on the coefficients 2. Collinearity - predictors that are highly collinear, i. e. , linearly related, can cause problems in estimating the regression coefficients.
Bonus: Math behind OLS error term
Our residuals/errors should be random and unpredictable Random – good! Some pattern => something else is going on that we didn’t catch in the model
regress bwt age predict res, residual list res in 1/10 Rvplot age
Diagnostics 1: Checking Normality of Residuals • • predict res, resid kdensity res, normal pnorm res qnorm res
Diagnostics 1: Unusual and influential data Open lbw data • gen black=. • replace black=1 if race==2 • replace black=0 if race==1|race==3 • sum bwt age lwt smoke ht ui ftv black
Influential Observation graph matrix bwt age lwt
- Slides: 11