Regression II Multiple Regression and Diagnostics Class 23
- Slides: 30
Regression II: Multiple Regression and Diagnostics Class 23
How Much Do Teacher Reprimands Lead to Bullying, Controlling for Family Stress?
Multiple Regression Is aggression by bullies predicted by Example teacher reprimands controlling for family stress? This is model with 2 predictors Y = b o + b 1 + b 2 + ε Y = __ Aggression bo = __ Intercept b 1 = __ family stress b 2 = __ reprimands ε = __ error (fam. Stress (b 1) and reprimands (b 2)). This Multiple Regression model shows: 1. Effect of total model (reprimands and fam. stress) 2. Effect of fam. stress 3. Effect of reprimands after accounting for stress. NOTE: Could also test: 1. 2. Effect of fam stress controlling for reprimands (switch IV order, above) Effect of (family stress +
Multiple Regression (MR) Y = bo + b 1 + b 2 + b 3 + ……bx + ε Multiple regression (MR) can incorporate any number of predictors in a model. “Regression plane” rather than regression line is created with 2 predictors. It is increasingly difficult to visualize regression model with 3 or more predictors. MR operates on same principles as simple regression. MR = correlation between observed Y and Y
Two Predictor Models Create a "Regression Plane" Aggressio n Reprimands Family Stress
Elements of Multiple Regression Total Sum of Squares (SST) = Deviation of each score from DV mean, square these deviations, then sum them. Residual Sum of Squares (SSR) = Each residual from total model (not a simple line), squared, then summed. Model Sum of Squares (SSM) = SST – SSR = The amount that the total model explains result above and beyond the simple mean. R 2 = SSM / SST = Proportion of variance explained, by the total model. Adjusted R 2 = R 2, but adjusted for number of predictors. NOTE: Main diff. between these values in mutli.
Methods of Regression Hierarchical: 1. Predictors selected based on theory or past work 2. Predictors entered into analysis in order of importance, or by established influence. 3. New predictors are entered last, so that their unique contribution can be determined. Forced Entry: All predictors forced into model simultaneously. Use when no prediction about predictor primacy. Stepwise: Program automatically searches for strongest predictor, then second strongest, etc. Predictor 1—is best at explaining entire model, accounts for say 40%. Predictor 2 is best at explaining remaining 60%, etc. Controversial method. In general, Hierarchical is most common and most
Sample Size in Regression Green’s Rule of Thumb: Overall Model: 50 + 8 k (k = #predictors) Specific predictor (i. e. , specific b): 104 + k Unsure which? Use the one requiring larger n Determining sample size based on expected effect. Miles & Shevlin (2001) table (next slide). Power Analysis: Required n determined by: 1. Effect size: Proportion of variance explained, e. g. 2. Alpha level: E. g. p <. 05 3. Power: Odds of observing effect when it is actually there. G*Power http//www. psychologie. hhu. de/arbeitsgruppen/allgemeine-
Miles and Shivlin Regression Sample Size Graph In Field, A. (2009). Discovering Statistics Using SPSS, V. 3, p. 223
Multiple Regression in SPSS REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(. 05) POUT(. 10) /NOORIGIN /DEPENDENT aggression /METHOD=ENTER family. stress /METHOD=ENTER reprimands. “OUTS” refers to variables excluded in, e. g. Model 1, but appear e. g. Model 2. “NOORIGIN” means “do show the constant in outcome report”. “CRITERIA” relates to Stepwise Regression only; refers to which IVs kept in at Step
SPSS Multiple Regression Output: Descriptives What are IVs? What is DV? stress, reprimands aggression
SPSS Regression Output: Model correlation of model to outcome Effects R= R 2 = Power of regress. model; how much total model correlates with DV Amount var. explained by model 2 Adj. R Adjusts for # = predictors. Always ≤ R=2 R 2 change Amount explained by each new model Sig. F Change Does new model explain = signif. amount of added variance ANOVA sig. = Significance of TOTAL
SPSS Regression Output: Predictor Effects Constant refers to what? B refers to what? Intercept; Value of DV when model = 0 Slope; influence of specific IV on DV Std. Error refers to what? Beta refers to what? Variance around the specific IV slope t refers to what? B / Std. Error Sig. refers to what? Significance of effect of IV on DV, sig. of Standardization of B
Reporting Hierarchical Multiple Regression Table 1: Effects of Family Stress and Teacher Reprimands on Bullying B SE B β Step 1 Constant -0. 54 0. 42 Fam. Stress 0. 74 0. 11 Constant 0. 71 0. 34 Fam. Stress 0. 57 0. 10 . 67 * Reprimands 0. 33 0. 10 . 38 * . 85 * Step 2 Note: R 2 =. 72 for Step 1, Δ R 2 =. 11 for Step 2 (p =. 004);
Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Variable Types: Predictors must be quantitative or categorical (2 values only, i. e. dichotomous); Outcomes must be interval. Non-Zero Variance: Predictors have variation in value. No Perfect multicollinearity: No perfect 1: 1 (linear) relationship between 2 or more predictors. Predictors uncorrelated to external variables: No hidden “third variable” confounds Homoscedasticity: Variance at each level of predictor is constant. Linearity: The changes in outcome due to each
Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Linearity: The changes in outcome due to each predictor are described best by a straight line. Disclosure and Intimacy
Requirements and Assumptions (Continued) Independent Errors: Residuals for Sub. 1 ≠ residuals for Sub. 2. For example Sub. 2 sees Sub 1 screaming as Sub 1 leaves experiment. Sub 1 might influence Sub 2. If each new sub is affected by preceding sub, then this influence will reduce independence of errors, i. e. , create autocorrelation. Autocorrelation is bias due to temporal adjacency. Assess: Durbin-Watson test. Values range from 0 - 4, "2" is ideal. Closer to 0 means neg. correl, closer to 4 =Sub pos. 1 correl. Funny movie r (s 1 s 2) + Sub 2 Funny r (s 2 s 3) + movie r (s 3 s 4) Sub 3 Sad movie r (s 4 s 5) Sub 4 Sad movie r (s 5 s 6) + Sub 5 Funny
Durbin-Watson Test of DATASETAutocorrelation ACTIVATE Data. Set 1. REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(. 05) POUT(. 10) /NOORIGIN /DEPENDENT crytotl /METHOD=ENTER age upset /RESIDUALS DURBIN.
Regression Assumes Errors are normally, independently, and identically Distributed at Every Level of the Predictor (X) X 1 X 2 X 3 Independence of DV: All outcome values are independent from one another, i. e. , each response comes from a separate subject who is uninfluenced by other subjects. e. g. , Joe and Joelle are a competitive dyad; Joe looses every time Joelle succeeds. DV is
Multicollinearity In multiple regression, statistic assumes that each new predictor is in fact a unique measure. If two predictors, A and B, are very highly correlated, then a model testing the added effect of Predictors A and B might, in effect, be testing Predictor A twice. If so, the slopes of each variable are not orthogonal (go in different directions, but instead run parallel to each other (i. e. , they are co-linear). Non-orthogonal Orthogonal
Mac Collinearity: A Multicollinearity Saga Suffering negative publicity regarding the health risks of fast food, the fast food industry hires the research firm of Fryes, Berger, and Shayque (FBS) to show that there is no intrinsic harm in fast food. FBS surveys a random sample, and asks: a. To what degree are you a meat eater? (carnivore) b. How often do you purchase fast food? (fast. food) c. What is your health status? (health) FBS conducts a multiple regression, entering fast. food in step one and carnivore in step 2.
FBS Fast Food and Carnivore Analysis “See!” the FBS researchers rejoice, “Fast Food negatively predicts health in Model 1, BUT the effect of fast food on health goes away in Model 2, when being a carnivore is considered. ”
Not So Fast, Fast Food Flacks Colinearity Diagnostics 1. Correlation table 2. Collinearity Statistics VIF (should be < 10) and/or Tolerance should be more than. 20
Homoscedasticity and Heteroscedasticity
Assessing Homoscedasticity Select: Plots Enter: ZRESID for Y and ZPRED for X Ideal Outcome: Equal distribution across chart
Extreme Cases that deviate greatly from expected outcome (> ± 2. 5) can warp regression. First, identify outliers using Casewise Diagnostics option. * * *** * * Then, correct outliers per outlier-correction options, which are: for data entry error 1. Check 2. Transform data 3. Recode as next highest/lowest plus/minus 1 4. Delete outlier
Casewise Diagnostics Print-out in SPSS Possible problem case
Casewise Diagnostics for Problem Cases Only In "Statistics" Option, select Casewise Diagnostics Select "outliers outside" and type in how many Std. Dev. you regard as critical. Default = 3, but can change to other value (e. g. 2. 5)
What If Assumption(s) are Violated? What is problem with violating assumptions? Can't generalize from test sample to wider population. Overall, not much can be done if assumptions are substantially violated (i. e. , extreme heteroscedasticity, extreme autocorrelation, severe non-linearity). Some options: 1. Heteroscedasticity: Transform raw data (sqr. root, etc. )
A Word About Regression Assumptions and Diagnostics Are these conditions complicated to understand? Somewhat Are they laborious to check and correct? Somewhat Do most researchers understand, monitor, and address these conditions? No Even journal reviewers are often unschooled, or don’t take time, to check diagnostics. Journal space discourages authors from discussing diagnostics. Some have called for more attention to this inattention, but not much action.
- Simple linear regression and multiple linear regression
- Linear model regression
- Diagnostics and recovery toolset
- Mitel system administration and diagnostics download
- Logistic regression vs linear regression
- Logistic regression vs linear regression
- Logistic regression interaction interpretation
- Anova multiple regression
- Extra sum of squares multiple regression
- Multiple regression analysis with qualitative information
- Multiple regression analysis estimation
- Dataset for multiple regression analysis
- Multiple regression analysis adalah
- Polynomial regression with multiple variables in r
- Linear regression with multiple variables machine learning
- Multiple linear regression variance
- Multiple nonlinear regression spss
- Logistic regression spss output
- Perbedaan analisis regresi berganda dan logistik
- Multiple linear regression variance
- Hierarchical multiple regression spss
- Multiple linear regression analysis formula
- Multiple regression scatter plot
- R-squared interpretation example
- Kr
- Multiple regression analysis inference
- Quantitative regression analysis
- Unrestricted
- Multiple regression analysis inference
- Mahalanobis distance spss
- Hypothesis for multiple regression