Regression II Multiple Regression and Diagnostics Class 23

  • Slides: 30
Download presentation
Regression II: Multiple Regression and Diagnostics Class 23

Regression II: Multiple Regression and Diagnostics Class 23

How Much Do Teacher Reprimands Lead to Bullying, Controlling for Family Stress?

How Much Do Teacher Reprimands Lead to Bullying, Controlling for Family Stress?

Multiple Regression Is aggression by bullies predicted by Example teacher reprimands controlling for family

Multiple Regression Is aggression by bullies predicted by Example teacher reprimands controlling for family stress? This is model with 2 predictors Y = b o + b 1 + b 2 + ε Y = __ Aggression bo = __ Intercept b 1 = __ family stress b 2 = __ reprimands ε = __ error (fam. Stress (b 1) and reprimands (b 2)). This Multiple Regression model shows: 1. Effect of total model (reprimands and fam. stress) 2. Effect of fam. stress 3. Effect of reprimands after accounting for stress. NOTE: Could also test: 1. 2. Effect of fam stress controlling for reprimands (switch IV order, above) Effect of (family stress +

Multiple Regression (MR) Y = bo + b 1 + b 2 + b

Multiple Regression (MR) Y = bo + b 1 + b 2 + b 3 + ……bx + ε Multiple regression (MR) can incorporate any number of predictors in a model. “Regression plane” rather than regression line is created with 2 predictors. It is increasingly difficult to visualize regression model with 3 or more predictors. MR operates on same principles as simple regression. MR = correlation between observed Y and Y

Two Predictor Models Create a "Regression Plane" Aggressio n Reprimands Family Stress

Two Predictor Models Create a "Regression Plane" Aggressio n Reprimands Family Stress

Elements of Multiple Regression Total Sum of Squares (SST) = Deviation of each score

Elements of Multiple Regression Total Sum of Squares (SST) = Deviation of each score from DV mean, square these deviations, then sum them. Residual Sum of Squares (SSR) = Each residual from total model (not a simple line), squared, then summed. Model Sum of Squares (SSM) = SST – SSR = The amount that the total model explains result above and beyond the simple mean. R 2 = SSM / SST = Proportion of variance explained, by the total model. Adjusted R 2 = R 2, but adjusted for number of predictors. NOTE: Main diff. between these values in mutli.

Methods of Regression Hierarchical: 1. Predictors selected based on theory or past work 2.

Methods of Regression Hierarchical: 1. Predictors selected based on theory or past work 2. Predictors entered into analysis in order of importance, or by established influence. 3. New predictors are entered last, so that their unique contribution can be determined. Forced Entry: All predictors forced into model simultaneously. Use when no prediction about predictor primacy. Stepwise: Program automatically searches for strongest predictor, then second strongest, etc. Predictor 1—is best at explaining entire model, accounts for say 40%. Predictor 2 is best at explaining remaining 60%, etc. Controversial method. In general, Hierarchical is most common and most

Sample Size in Regression Green’s Rule of Thumb: Overall Model: 50 + 8 k

Sample Size in Regression Green’s Rule of Thumb: Overall Model: 50 + 8 k (k = #predictors) Specific predictor (i. e. , specific b): 104 + k Unsure which? Use the one requiring larger n Determining sample size based on expected effect. Miles & Shevlin (2001) table (next slide). Power Analysis: Required n determined by: 1. Effect size: Proportion of variance explained, e. g. 2. Alpha level: E. g. p <. 05 3. Power: Odds of observing effect when it is actually there. G*Power http//www. psychologie. hhu. de/arbeitsgruppen/allgemeine-

Miles and Shivlin Regression Sample Size Graph In Field, A. (2009). Discovering Statistics Using

Miles and Shivlin Regression Sample Size Graph In Field, A. (2009). Discovering Statistics Using SPSS, V. 3, p. 223

Multiple Regression in SPSS REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS

Multiple Regression in SPSS REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(. 05) POUT(. 10) /NOORIGIN /DEPENDENT aggression /METHOD=ENTER family. stress /METHOD=ENTER reprimands. “OUTS” refers to variables excluded in, e. g. Model 1, but appear e. g. Model 2. “NOORIGIN” means “do show the constant in outcome report”. “CRITERIA” relates to Stepwise Regression only; refers to which IVs kept in at Step

SPSS Multiple Regression Output: Descriptives What are IVs? What is DV? stress, reprimands aggression

SPSS Multiple Regression Output: Descriptives What are IVs? What is DV? stress, reprimands aggression

SPSS Regression Output: Model correlation of model to outcome Effects R= R 2 =

SPSS Regression Output: Model correlation of model to outcome Effects R= R 2 = Power of regress. model; how much total model correlates with DV Amount var. explained by model 2 Adj. R Adjusts for # = predictors. Always ≤ R=2 R 2 change Amount explained by each new model Sig. F Change Does new model explain = signif. amount of added variance ANOVA sig. = Significance of TOTAL

SPSS Regression Output: Predictor Effects Constant refers to what? B refers to what? Intercept;

SPSS Regression Output: Predictor Effects Constant refers to what? B refers to what? Intercept; Value of DV when model = 0 Slope; influence of specific IV on DV Std. Error refers to what? Beta refers to what? Variance around the specific IV slope t refers to what? B / Std. Error Sig. refers to what? Significance of effect of IV on DV, sig. of Standardization of B

Reporting Hierarchical Multiple Regression Table 1: Effects of Family Stress and Teacher Reprimands on

Reporting Hierarchical Multiple Regression Table 1: Effects of Family Stress and Teacher Reprimands on Bullying B SE B β Step 1 Constant -0. 54 0. 42 Fam. Stress 0. 74 0. 11 Constant 0. 71 0. 34 Fam. Stress 0. 57 0. 10 . 67 * Reprimands 0. 33 0. 10 . 38 * . 85 * Step 2 Note: R 2 =. 72 for Step 1, Δ R 2 =. 11 for Step 2 (p =. 004);

Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Variable Types: Predictors

Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Variable Types: Predictors must be quantitative or categorical (2 values only, i. e. dichotomous); Outcomes must be interval. Non-Zero Variance: Predictors have variation in value. No Perfect multicollinearity: No perfect 1: 1 (linear) relationship between 2 or more predictors. Predictors uncorrelated to external variables: No hidden “third variable” confounds Homoscedasticity: Variance at each level of predictor is constant. Linearity: The changes in outcome due to each

Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Linearity: The changes

Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Linearity: The changes in outcome due to each predictor are described best by a straight line. Disclosure and Intimacy

Requirements and Assumptions (Continued) Independent Errors: Residuals for Sub. 1 ≠ residuals for Sub.

Requirements and Assumptions (Continued) Independent Errors: Residuals for Sub. 1 ≠ residuals for Sub. 2. For example Sub. 2 sees Sub 1 screaming as Sub 1 leaves experiment. Sub 1 might influence Sub 2. If each new sub is affected by preceding sub, then this influence will reduce independence of errors, i. e. , create autocorrelation. Autocorrelation is bias due to temporal adjacency. Assess: Durbin-Watson test. Values range from 0 - 4, "2" is ideal. Closer to 0 means neg. correl, closer to 4 =Sub pos. 1 correl. Funny movie r (s 1 s 2) + Sub 2 Funny r (s 2 s 3) + movie r (s 3 s 4) Sub 3 Sad movie r (s 4 s 5) Sub 4 Sad movie r (s 5 s 6) + Sub 5 Funny

Durbin-Watson Test of DATASETAutocorrelation ACTIVATE Data. Set 1. REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG

Durbin-Watson Test of DATASETAutocorrelation ACTIVATE Data. Set 1. REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(. 05) POUT(. 10) /NOORIGIN /DEPENDENT crytotl /METHOD=ENTER age upset /RESIDUALS DURBIN.

Regression Assumes Errors are normally, independently, and identically Distributed at Every Level of the

Regression Assumes Errors are normally, independently, and identically Distributed at Every Level of the Predictor (X) X 1 X 2 X 3 Independence of DV: All outcome values are independent from one another, i. e. , each response comes from a separate subject who is uninfluenced by other subjects. e. g. , Joe and Joelle are a competitive dyad; Joe looses every time Joelle succeeds. DV is

Multicollinearity In multiple regression, statistic assumes that each new predictor is in fact a

Multicollinearity In multiple regression, statistic assumes that each new predictor is in fact a unique measure. If two predictors, A and B, are very highly correlated, then a model testing the added effect of Predictors A and B might, in effect, be testing Predictor A twice. If so, the slopes of each variable are not orthogonal (go in different directions, but instead run parallel to each other (i. e. , they are co-linear). Non-orthogonal Orthogonal

Mac Collinearity: A Multicollinearity Saga Suffering negative publicity regarding the health risks of fast

Mac Collinearity: A Multicollinearity Saga Suffering negative publicity regarding the health risks of fast food, the fast food industry hires the research firm of Fryes, Berger, and Shayque (FBS) to show that there is no intrinsic harm in fast food. FBS surveys a random sample, and asks: a. To what degree are you a meat eater? (carnivore) b. How often do you purchase fast food? (fast. food) c. What is your health status? (health) FBS conducts a multiple regression, entering fast. food in step one and carnivore in step 2.

FBS Fast Food and Carnivore Analysis “See!” the FBS researchers rejoice, “Fast Food negatively

FBS Fast Food and Carnivore Analysis “See!” the FBS researchers rejoice, “Fast Food negatively predicts health in Model 1, BUT the effect of fast food on health goes away in Model 2, when being a carnivore is considered. ”

Not So Fast, Fast Food Flacks Colinearity Diagnostics 1. Correlation table 2. Collinearity Statistics

Not So Fast, Fast Food Flacks Colinearity Diagnostics 1. Correlation table 2. Collinearity Statistics VIF (should be < 10) and/or Tolerance should be more than. 20

Homoscedasticity and Heteroscedasticity

Homoscedasticity and Heteroscedasticity

Assessing Homoscedasticity Select: Plots Enter: ZRESID for Y and ZPRED for X Ideal Outcome:

Assessing Homoscedasticity Select: Plots Enter: ZRESID for Y and ZPRED for X Ideal Outcome: Equal distribution across chart

Extreme Cases that deviate greatly from expected outcome (> ± 2. 5) can warp

Extreme Cases that deviate greatly from expected outcome (> ± 2. 5) can warp regression. First, identify outliers using Casewise Diagnostics option. * * *** * * Then, correct outliers per outlier-correction options, which are: for data entry error 1. Check 2. Transform data 3. Recode as next highest/lowest plus/minus 1 4. Delete outlier

Casewise Diagnostics Print-out in SPSS Possible problem case

Casewise Diagnostics Print-out in SPSS Possible problem case

Casewise Diagnostics for Problem Cases Only In "Statistics" Option, select Casewise Diagnostics Select "outliers

Casewise Diagnostics for Problem Cases Only In "Statistics" Option, select Casewise Diagnostics Select "outliers outside" and type in how many Std. Dev. you regard as critical. Default = 3, but can change to other value (e. g. 2. 5)

What If Assumption(s) are Violated? What is problem with violating assumptions? Can't generalize from

What If Assumption(s) are Violated? What is problem with violating assumptions? Can't generalize from test sample to wider population. Overall, not much can be done if assumptions are substantially violated (i. e. , extreme heteroscedasticity, extreme autocorrelation, severe non-linearity). Some options: 1. Heteroscedasticity: Transform raw data (sqr. root, etc. )

A Word About Regression Assumptions and Diagnostics Are these conditions complicated to understand? Somewhat

A Word About Regression Assumptions and Diagnostics Are these conditions complicated to understand? Somewhat Are they laborious to check and correct? Somewhat Do most researchers understand, monitor, and address these conditions? No Even journal reviewers are often unschooled, or don’t take time, to check diagnostics. Journal space discourages authors from discussing diagnostics. Some have called for more attention to this inattention, but not much action.