Chapter 17 Basic Multivariate Techniques Winston Jackson and
Chapter 17 Basic Multivariate Techniques Winston Jackson and Norine Verberg Methods: Doing Social Research, 4 e © 2007 Pearson Education Canada
Testing Three-Variable Causal Models n Demonstrating causality is more difficult in nonexperimental research than experimental n To establish causality you must show that: 1. 2. 3. Variables are associated Plausible causal sequence Variables not spuriously connected n Identical analyses can be used to test different three-variable models © 2007 Pearson Education Canada 2
Testing Three-Variable Causal Models (cont’d) Begin with standard contingency table; three variable models elaborate upon bivariate models TABLE 17. 1 Percentage of Senior High-School Students with Plans for Further Education by Socioeconomic Status (SES) TYPE OF PLAN LOW SES BACKGROUND HIGH SES BACKGROUND TOTAL N % N % 144 73. 1 176 88. 9 320 81. 0 53 26. 9 22 11. 1 75 19. 0 TOTAL 197 100. 0 198 100. 0 395 100. 0 X 2 = 16. 021 df = 1 Some plans No plans Significant at the. 001 level. © 2007 Pearson Education Canada 3
Testing for Intervening Variables: The Intervening Variable Model n >X >I >Y n Can propose a number of possible intervening variables. For example: © 2007 Pearson Education Canada 4
The Rationale n If the model is correct, X should only be able to influence Y through the intervening variable (I). n If we hold the “I” variable constant, X cannot influence Y © 2007 Pearson Education Canada 5
Jackson’s Rule of Thirds n Compare the original difference to what happens to the difference when we run the control for the intervening variable n To do so, we decide where the cut-points are between the thirds n Table 17. 3, Applying Jackson’s Rule of Thirds, shows how to calculate and interpret the results (see next slide) © 2007 Pearson Education Canada 6
Jackson’s Rule of Thirds: Using Crosstabs to Test for an Intervening Variable Original Difference: Low SES High SES % with plans: 73. 1 88. 9 Difference: 15. 8 © 2007 Pearson Education Canada 7
Jackson’s Rule of Thirds: Calculating and Interpreting Thirds n Original difference = 15. 8 (88. 9 – 73. 1) n Third = 15. 8 / 3 = 5. 3 Interpretation of results If difference > 21. 2 (15. 8 + 5. 3) = increased n If between 10. 5 and 21. 1 (15. 8 ± 5. 3) = same n If < 10. 5 (15. 9 – 5. 3) = decreased n If different in two categories = mixed n © 2007 Pearson Education Canada 8
Outcome 1 (from Table 17. 4, p. 446) n Original Difference: % with plans Difference: Low SES 73. 1 High SES 88. 9 15. 8 n Crosstabs, Outcome 1: SES Level % with plans Difference Best Friend High SES Low SES High SES 92. 0 93. 0 1. 0 Best Friend Low SES High SES 71. 0 69. 0 – 2. 0 n Interpretation: Relationship decreased/disappeared: this outcome is consistent with an intervening variable model © 2007 Pearson Education Canada 9
Outcome 2 (from Table 17. 4, p. 446) n Original Difference: % with plans Difference: Low SES 73. 1 High SES 88. 9 15. 8 n Crosstabs, Outcome 2: SES Level % with plans Difference Best Friend High SES Low SES High SES 74. 0 92. 0 18. 0 Best Friend Low SES High SES 71. 0 86. 0 15. 0 n Interpretation: Relationship stays the same: reject the intervening variable model © 2007 Pearson Education Canada 10
Outcome 3 (from Table 17. 4, p. 446) n Original Difference: % with plans Difference: Low SES 73. 1 High SES 88. 9 15. 8 n Crosstabs, Outcome 3: SES Level % with plans Difference Best Friend High SES Low SES High SES 85. 0 92. 0 7. 0 Best Friend Low SES High SES 68. 0 76. 0 8. 0 n Interpretation: Relationship decreased/disappeared: this outcome supports an intervening variable model © 2007 Pearson Education Canada 11
Outcome 4 (from Table 17. 4, p. 446) n Original Difference: % with plans Difference: Low SES 73. 1 High SES 88. 9 15. 8 n Crosstabs, Outcome 4: SES Level % with plans Difference Best Friend High SES Low SES High SES 74. 0 96. 0 22. 0 Best Friend Low SES High SES 61. 0 82. 0 21. 0 n Interpretation: Relationship strengthened: reject the intervening variable model © 2007 Pearson Education Canada 12
Outcome 5 (from Table 17. 4, p. 446) n Original Difference: % with plans Difference: Low SES 73. 1 High SES 88. 9 15. 8 n Crosstabs, Outcome 5: SES Level % with plans Difference Best Friend High SES Low SES High SES 90. 0 92. 0 Best Friend Low SES High SES 60. 0 82. 0 22. 0 n Interpretation: Results are mixed: reject the intervening variable model © 2007 Pearson Education Canada 13
Using Means to Test for an Intervening Variable Model Original Difference (from bottom of Table 17. 5, p. 448) : Low SES High SES % with plans 2. 47 3. 87 Difference: 1. 40 Calculating thirds = 1. 40 / 3 =. 47 n Increased if > 1. 40 +. 47 = 1. 88 n Stayed Same if 1. 40 ±. 47 =. 93 – 1. 87 n Decreased if < 1. 40 –. 47 =. 93 n Mixed if different in two categories © 2007 Pearson Education Canada 14
Outcome 1 (from Table 17. 5, p. 448) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 1 SES Level % Plans Difference No Support Low High 1. 49 3. 40 1. 91 Support Low High 3. 30 5. 26 1. 96 n Interpretation: Relationship intensified (shown by increases): reject the intervening variable model; financial support likely has an independent influence on dependent variable © 2007 Pearson Education Canada 15
Outcome 2 (from Table 17. 5, p. 448) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 2: Support SES Level % Plans Difference No Support Low 2. 23 High 3. 66 Low 2. 55 1. 43 High 3. 96 1. 43 n Interpretation: Relationship stayed the same: reject the intervening variable model © 2007 Pearson Education Canada 16
Outcome 3 (from Table 17. 5, p. 448) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 3: SES Level % Plans Difference No Support Low High 1. 56 2. 27. 71 Support Low High 3. 42 4. 10. 68 n Interpretation: Relationship decreased/disappeared: evidence supports the intervening variable model © 2007 Pearson Education Canada 17
Outcome 4 (from Table 17. 5, p. 448) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 4: SES Level % Plans Difference No Support Low High 2. 35 2. 48 0. 13 Support Low High 3. 77 3. 92 0. 15 n Interpretation: Relationship decreased/disappeared: this outcome supports an intervening variable model © 2007 Pearson Education Canada 18
Outcome 5 (from Table 17. 5, p. 448) n Original Difference: Low SES 2. 47 % with plans Difference: High SES 3. 87 1. 40 n Means, Outcome # 5 SES Level % Plans Difference No Support Low High 1. 64 3. 13 1. 49 Support Low High 3. 89 4. 01 0. 12 n Interpretation: Results are mixed: reject the intervening variable model © 2007 Pearson Education Canada 19
Testing For Sources of Spuriousness: The Source of Spuriousness Model n Researcher proposes that there is a statistically significant relation between X and Y, but the relationship may not be causal, existing only because some third variable is influencing both © 2007 Pearson Education Canada 20
Source of Spuriousness: Rationale n If X and Y are spuriously associated, the reason they vary together is that a third variable (source of spuriousness S/S) is influencing both X and Y n If we control for the S/S, there should no longer be any association between X and Y n Use same test and steps: test the original X/Y relationship. If significant, apply Jackson’s rule of thirds. If original difference disappears, source of spuriousness model is supported © 2007 Pearson Education Canada 21
Source of Spuriousness: Dilemma n Results are not empirically distinguishable n Two researchers propose two different causal models to explain the X/Y relation; one proposes a S/S model, the other proposes an intervening variable model n If original difference disappears, each finds support for the different causal models n Stresses the importance of a priori theorizing – interpretation guided by theory © 2007 Pearson Education Canada 22
Using Means to Test for a Source of Spuriousness Model n Table 17. 8, on page 452, shows five different outcomes to illustrate using Jackson’s Rule of Thirds to test for a source of spuriousness n The source of spuriousness identified is rural versus urban background. The researcher is suggesting that both SES and plans for postsecondary education are explained by this third variable (urban/rural residence) n The next five slides show the different outcomes © 2007 Pearson Education Canada 23
Using Means to Test for a Source of Spuriousness Model (cont’d) Original Difference (from bottom of Table 17. 5, p. 448) : Low SES High SES % with plans 2. 47 3. 87 Difference: 1. 40 Calculating thirds = 1. 40 / 3 =. 47 n Increased if > 1. 40 +. 47 = 1. 88 n Stayed Same if 1. 40 ±. 47 =. 93 – 1. 87 n Decreased if < 1. 40 –. 47 =. 93 n Mixed if different in two categories © 2007 Pearson Education Canada 24
Outcome 1 (from Table 17. 8, p. 452) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 1 SES Level % Plans Difference Rural Background Low High 0. 87 2. 98 2. 11 Urban Background Low High 3. 33 5. 26 1. 93 n Interpretation: Relationship intensified (shown by increases): reject the source of spuriousness model; rural/urban background likely has an independent influence on dependent variable © 2007 Pearson Education Canada 25
Outcome 2 (from Table 17. 8, p. 452) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 2 SES Level % Plans Difference Rural Background Low High 2. 27 3. 67 1. 40 Urban Background Low High 2. 58 3. 96 1. 38 n Interpretation: Difference remain the same after control for urban rural background: reject source of spuriousness model © 2007 Pearson Education Canada 26
Outcome 3 (from Table 17. 8, p. 452) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 3 SES Level % Plans Difference Rural Background Low High 2. 30 2. 91 0. 61 Urban Background Low High 3. 71 4. 23 0. 61 n Interpretation: Difference decreased: we find support for the source of spuriousness model © 2007 Pearson Education Canada 27
Outcome 4 (from Table 17. 8, p. 452) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 4 SES Level % Plans Difference Rural Background Low High 1. 73 1. 91 0. 18 Urban Background Low High 3. 74 3. 97 0. 23 n Interpretation: Original difference reduced to less than one- third its original value: supports the source of spuriousness model © 2007 Pearson Education Canada 28
Outcome 5 (from Table 17. 8, p. 452) n Original Difference: % with plans Difference: Low SES 2. 47 High SES 3. 87 1. 40 n Means, Outcome 5 SES Level % Plans Difference Rural Background Low High 2. 16 2. 27 0. 11 Urban Background Low High 2. 74 4. 04 1. 30 n Interpretation: Results are mixed. Difference disappears among rural students, but is only slightly reduced among urban students. We reject the source of spuriousness model © 2007 Pearson Education Canada 29
Multiple Regression: Regression n Multiple regression is used: n to examine the impact of several variables on a dependent variable n when the dependent variable and, preferably, most of the independent variables are ratio level © 2007 Pearson Education Canada 30
Multiple Regression (cont’d) n Multiple regression is a powerful tool because it allows the researcher to: estimate the relative importance of each of the independent variables in predicting variation in a dependent variable n identify a linear equation describing the relationship between the independent and dependent variables n © 2007 Pearson Education Canada 31
The Linear Regression Equation n Elements in the equation tell us the relative importance of each factor in predicting the dependent variable. n Recall, from Chapter 8, the regression formula for two variables: Y = a + b. X n Multiple Regression extends the equation where: Y = a + b 1 X 1 + b 2 X 2 + … b k. X k © 2007 Pearson Education Canada 32
The Linear Equation (cont’d) n Y = a + b 1 X 1 + b 2 X 2 + …bk. Xk Y is the dependent variable a is the constant, the point where the regression line crosses the Y axis b represent the beta weightings for each of the independent variables X is the value of the independent variable © 2007 Pearson Education Canada 33
The Linear Equation (cont’d) n Y = a + ß 1 X 1 + ß 2 X 2 + …ßk. Xk n ß These values are knows as beta weights. n A beta weight simply represents a standardized version of a b coefficient. n Think of ßs as Z-score versions of the b coefficients. Recall that Z scores standardize variables © 2007 Pearson Education Canada 34
The Linear Equation (cont’d) n To compute the relative importance of variables once we have the betas, we can use the following formula: % Variance explained by each variable = © 2007 Pearson Education Canada ß 1 x R 2 x 100 ßs 35
Multiple Regression (cont’d) n SPSS will produce both b and ß values. The a value (called the constant) will also be printed. n R 2 : This value will also be reported which tells you how much of the variance in the dependent variable is explained by the equation © 2007 Pearson Education Canada 36
Using Non-Ratio Level Variables n Ordinal variables may be included in their raw form (un-recoded) but the equation will underestimate the relative importance of nonratio variables n Nominal variables may be included by transforming them into “dummy variables” n Dummy variables are recoded to “presence/absence” variables © 2007 Pearson Education Canada 37
Creating Dummy Variables n Create new variables to replace the nominal variable so that you have one fewer variables than categories in the original variable. n i. e. , if you have a four-category religion variable (Christian, Jewish, Muslim, Other), then recode this into three new variables coded into presence (1) / absence (0). n shown in Table 17. 9, p. 462 © 2007 Pearson Education Canada 38
Tips for Regression Analysis 1. Ensure that variables are theoretically independent of one another 2. Watch out for highly correlated independent variables (multicollinearity) n Either convert these into an index (if that makes sense) or simply select one of them 3. Try to achieve ratio-level measurement 4. Use raw data: do not use recoded forms of ordinal or ratio variables © 2007 Pearson Education Canada 39
Tips (cont’d) 5. Use Backward solution so least important variables drop out first 6. Interpret weightings with care 7. Monitor number of cases n When missing values are a concern, try: n n n Repeat analysis keeping problem variables “Pairwise” treatment of missing values “Means” solution where missing values set to mean for the variable © 2007 Pearson Education Canada 40
Tips (cont’d) 8. Deal with interactions among independent variables n If two variables have little impact on the dependent variable independently, but you expect the interaction to explain variation in Y, create an interaction variable © 2007 Pearson Education Canada 41
Discriminant Function Analysis n Similar to regression analysis but used in cases where the dependent variable is either: measured at the nominal level, or n not normally distributed n n Discriminant function analysis attempts to predict the category of the Y variable into which each case falls by using the combined information from the X variables n e. g. , predict whether someone will participate in post-secondary education, based on info on grade 11 average, SES, family size, etc. © 2007 Pearson Education Canada 42
Comparison with Multiple Regression n Similar: can look at impact of several X variables n Results in the calculation of discriminant coefficients similar to a regression equation D = B 0 + B 1 X 1 + B 2 X 2 +. . . + Bk. Xk n B 0 = the constant n B 1 = the coefficient for the 1 st variable n n To compute the “discriminant score”: multiply the coefficient by the observed value (see Table 17. 11, p. 466). © 2007 Pearson Education Canada 43
Discriminant Analysis (cont’d) n Discriminant analysis assumes ratio level independent variables (similar to regression) and, like regression, dummy variables may be included. n Both standardized and unstandardized coefficients are provided on the output. n If you want to calculate relative contributions, use the standardized version © 2007 Pearson Education Canada 44
Discriminant Analysis (cont’d) n When discriminant analysis is run, you will get a report on the % of cases that can be correctly classified by using the information on the independent variables n The analysis relies on Lambda. n This statistic measures the proportionate reduction in error that results with knowledge of the independent variables © 2007 Pearson Education Canada 45
Table 17. 12 Discriminant Analysis, Sample Presentation ACTUAL GROUP Participate (1) Not participate (2) TOTAL NUMBER OF CASES PREDICTED GROUP MEMBERSHIP 1 2 261 221 40 83 11 72 344 232 112 Percentage of “grouped” cases correctly classified: 293 out of 344 cases = 85. 2%. © 2007 Pearson Education Canada 46
- Slides: 46