INTERACTION 1 WHEN INTERACTION OCCURS An interaction occurs

  • Slides: 47
Download presentation
INTERACTION 1

INTERACTION 1

WHEN INTERACTION OCCURS • An interaction occurs when an independent variable has a different

WHEN INTERACTION OCCURS • An interaction occurs when an independent variable has a different effect on the outcome depending on the values of another independent variable. • Effect of one covariate on response is different at different levels of the second one. 2

Interaction • Interaction is a three-variable concept. One of these is the response variable

Interaction • Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X 1 and X 2). • There is an interaction between X 1 and X 2 if the impact of an increase in X 2 on Y depends on the level of X 1. • To incorporate interaction in multiple regression model, we add the explanatory variable. There is evidence of an interaction if the coefficient on is significant (t-test has p-value <. 05). 3

EXAMPLE OF INTERACTION * • Suppose that there is a cholesterol lowering drug that

EXAMPLE OF INTERACTION * • Suppose that there is a cholesterol lowering drug that is tested through a clinical trial. Suppose we are expecting a linear doseresponse over a given range of drug dose, so that the picture looks like this: * http: //www. medicine. mcgill. ca/epidemiology/Joseph/courses/EPIB-621/interaction. pdf 4

 • This is a standard simple linear model. Now, however, suppose that we

• This is a standard simple linear model. Now, however, suppose that we expect men to respond at an overall higher level compared to women. There are various ways that this can happen. For example, if the difference in response between women and men is constant throughout the range, we would expect a graph like this: 5

 • However, if men, have a steeper dose-response curve compared to women, we

• However, if men, have a steeper dose-response curve compared to women, we would expect a picture like this: 6

 • On the other hand, if men, have a less steep dose-response curve

• On the other hand, if men, have a less steep dose-response curve compared to women, we would expect a picture like this: 7

 • Of these four graphs, § the first indicates no difference between men

• Of these four graphs, § the first indicates no difference between men and women, § the second illustrates that there is a difference, but since it is constant, there is no interaction term, § The third and fourth graphs represent the situation with an interaction of the effect of the drug, depending on whether it is given to men or women. • Adding interaction terms to a regression model can greatly expand understanding of the relationships among the variables in the model and allows more hypotheses to be tested. 8

 • In terms of regression equations: NO SEX EFFECT: • where Y represents

• In terms of regression equations: NO SEX EFFECT: • where Y represents the outcome (amount of cholesterol lowering), 1 represents the effect of the drug, and random error term. SEX HAS AN EFFECT, BUT NO INTERACTION: There are effects of both dose and sex, interpretation is still straightforward: Since it does not depend on which sex is being discussed (effect is the same in males and females), 1 still represents the amount by which cholesterol changes for each unit change in dose of the drug. Similarly, 2 represents the effect of sex, which is “additive” to the effect of dose, because to get the effect of both together for any dose, we simply add the two individual effects. 9

THE THIRD MODEL WITH AN INTERACTION TERM: • Things get a bit more complicated

THE THIRD MODEL WITH AN INTERACTION TERM: • Things get a bit more complicated when there is an interaction term. There is no longer any unique effect of dose, because it depends upon whether you are talking about the effect of dose in males or females. Similarly, the difference between males and females depends on the dose. • Consider first the effect of dose: The question of “what is the effect of dose” is not answerable until one knows which sex is being considered. The effect of dose is 1 for females (if they are coded as 0, and males coded as 1, as was the case here). This is because the interaction term becomes 0 if sex is coded as 0, so the interaction term “disappears”. 10

THE THIRD MODEL WITH AN INTERACTION TERM: • On the other hand, if sex

THE THIRD MODEL WITH AN INTERACTION TERM: • On the other hand, if sex is coded as 1 (males), the effect of dose is now equal to 1 + 3. This means, in practice, that for every one unit increase in dose, cholesterol changes by the amount 1 + 3 in males (compared to just 1 for females). • In these models have considered a continuous variable combined with a dichotomous (dummy or indicator) variable. We can also consider interactions between two dummy variables, and between two continuous variables. The principles remain the same, although some technical details change. 11

Interactions between two continuous independent variables • Consider the above example, but with age

Interactions between two continuous independent variables • Consider the above example, but with age and dose as independent variables. Notice that this means we have two continuous variables, rather than one continuous and one dichotomous variable. • In the absence of an interaction term, we simply have the model where Y is the amount of cholesterol lowering (dependent variable). With no interaction, interpretation of each effect is straightforward, as we just have a standard multiple linear regression model. The effect on cholesterol lowering would be 1 for each unit of dose increase, and 2 for each unit of age increase (i. e. , per year, if that is the unit of age). 12

 • Even though age will be treated as a continuous variable here, suppose

• Even though age will be treated as a continuous variable here, suppose for an instant it was coded as dichotomous, simply representing “old” and “young” subjects. Now, we would be back to the case already discussed above in detail, and the graph would look something like this: We see that the effects of dose on cholesterol lowering starts higher in younger compared to older subjects, but becomes lower as dose is increased. 13

 • What if we now add a middle category of “middle aged” persons?

• What if we now add a middle category of “middle aged” persons? The graph may now look something like this: 14

 • And if even more categories of age were added, we might get

• And if even more categories of age were added, we might get something like this: 15

 • Now imagine adding finer and finer age categories, slowly transforming the age

• Now imagine adding finer and finer age categories, slowly transforming the age variable from discrete (categorical) into a continuous variable. At the limit where age becomes continuous, we would have an infinite number of different slopes for the effect of dose, one slope for each of the infinite possible age values. This is what we have when we have a model with two continuous variables that interact with each other. • The model we would then have would look like this: • For any fixed value of age, say age 0, notice that the effect for dose is given by This means that the effect of dose changes depending on the age of the subject, so that there is really no “unique” effect of dose, it is different for each possible age value. 16

Interactions between two continuous independent variables • In summary: When there is an interaction

Interactions between two continuous independent variables • In summary: When there is an interaction term, the effect of one variable that forms the interaction depends on the level of the other variable in the interaction. 17

Interactions between two dichotomous variables • Suppose there are two medications, A and B,

Interactions between two dichotomous variables • Suppose there are two medications, A and B, and each is given to both males and females. If the medication may operate differently in males and females, the equation with interaction term can be written as (suppose coding is Med A = 0, Med B =1, Male = 0, Female=1): 18

 • Here, however, there are only four possibilities, as given in the table

• Here, however, there are only four possibilities, as given in the table below: • Without an interaction term, the mean value for Females on Med B would have been + 1+ 2. This implies a simple additive model, as we add the effect of being female to the effect of being on med B. However, with the interaction term as detailed above, the mean value for Females on Med B is + 1+ 2+ 3, implying that over and above the additive effect, there is an interaction effect of size 3. 19

Centering control variables in an interaction regression model • If you want results that

Centering control variables in an interaction regression model • If you want results that are a little more meaningful and easy to interpret, one approach is to center continuous variables first (i. e. subtract the mean from each case), and then compute the interaction term and estimate the model (Only center continuous variables though, i. e. you don’t want to center categorical dummy variables like gender). 20

SIMULATED EXAMPLE set. seed(999) b 0 <- 1. 4 # intercept b 1 <-

SIMULATED EXAMPLE set. seed(999) b 0 <- 1. 4 # intercept b 1 <- 0. 2 # continuous slope b 2 <- 1. 7 # factor level 1 coefficient b 1. 2 <- 0. 5 # interaction between b 1 and b 2 sigma <- 2. 0 # residual standard deviation N <- 25 # number of data points x 1 <- runif(N, 0, 20) # continuous predictor data x 2 <- rbinom(N, size = 1, prob = 0. 4) # binary predictor data # generate response data: y <- rnorm(N, mean = b 0 + b 1 * x 1 + b 2 * x 2 + x 1 * x 2 * b 1. 2, sd = sigma) dat <- data. frame(x 1, x 2, y) 21

> head(dat) x 1 x 2 1 y 7. 781428 1 9. 831531 2

> head(dat) x 1 x 2 1 y 7. 781428 1 9. 831531 2 11. 661214 0 1. 518768 3 1 2. 655639 1. 893314 4 17. 052625 1 11. 928647 5 15. 734935 0 4. 293629 6 0 6. 642697 2. 386845 Let’s look at the data we created: 22

library(ggplot 2) ggplot(dat, aes(x 1, y, colour = as. factor(x 2))) + geom_point() 23

library(ggplot 2) ggplot(dat, aes(x 1, y, colour = as. factor(x 2))) + geom_point() 23

 • Now, we’ll fit a model with and without an interaction and look

• Now, we’ll fit a model with and without an interaction and look at the coefficients: > m <- lm(y ~ x 1 * x 2, data = dat) > m_no_inter <- lm(y ~ x 1 + x 2, data = dat) > round(coef(m), 2) (Intercept) x 1 x 2 x 1: x 2 1. 72 0. 19 1. 27 0. 34 > round(coef(m_no_inter), 2) (Intercept) x 1 x 2 0. 61 0. 30 4. 33 24

 • Notice how the main effects (everything except the interaction) change dramatically when

• Notice how the main effects (everything except the interaction) change dramatically when the interaction is removed. This is because when the interaction is included the main effects are relevant to when the other predictors are equal to 0. That is, x 1 = 0 or the binary predictor x 2 is at its reference 0 level. But, when the interaction is excluded the main effects are relevant when the other predictors are at their mean values. So, if we center, the main effects will represent the same thing in both cases: dat$x 2_cent <- dat$x 2 - mean(dat$x 2) dat$x 1_cent <- dat$x 1 - mean(dat$x 1) m_center <- lm(y ~ x 1_cent * x 2_cent, data = dat) m_center_no_inter <- lm(y ~ x 1_cent + x 2_cent, data = dat) 25

> round(coef(m_center), 2) (Intercept) x 1_cent 5. 07 0. 31 > round(coef(m_center_no_inter), 2) (Intercept)

> round(coef(m_center), 2) (Intercept) x 1_cent 5. 07 0. 31 > round(coef(m_center_no_inter), 2) (Intercept) x 1_cent x 2_cent 4. 96 0. 30 4. 33 x 2_cent x 1_cent: x 2_cent 4. 48 0. 34 Notice that the intercept, x 1, and x 2 coefficient estimates are now similar regardless of whether the interaction is included. Now, because we’ve centered the predictors, the predictors equal zero at their mean. So, the main effects are estimating approximately the same thing regardless of whether we include the interaction. In other words, adding the interaction adds more predictive information but doesn’t modify the meaning of the main effects. 26

27

27

Example with real data • Consider the data set below, which contains data about

Example with real data • Consider the data set below, which contains data about various body measurements, as well as body fat. The goal is to check whether the independent variables Skinfold Thickness (ST), Thigh Circumference (TC), and/or Midarm Circumference (MC) predict the independent variable Body Fat (BF), and if so, whethere is any evidence of interactions among these variables. > head(fat) st tc mc bf [1, ] 19. 5 43. 1 29. 1 11. 9 [2, ] 24. 7 49. 8 28. 2 22. 8 [3, ] 30. 7 51. 9 37. 0 18. 7 [4, ] 29. 8 54. 3 31. 1 20. 1 [5, ] 19. 1 42. 2 30. 9 12. 9 [6, ] 25. 6 53. 9 23. 7 21. 7 28

We will follow these steps in analyzing these data: 1. Enter the data, and

We will follow these steps in analyzing these data: 1. Enter the data, and create new variables, for all interactions, including three two by two interaction terms, as well as the single interaction term with all three variables. 2. Look at descriptive statistics for all data. 3. Look at scatter plots for each variable. 4. Calculate a correlation matrix for all variables. 5. Calculate a simple linear regression for each variable. 6. Calculate a multiple linear regression for all variables, without interactions. 7. Add in various interactions, to see what happens. 8. Draw overall conclusions based on the totality of evidence from all models. 29

# Create new variables, for all interactions, including three two # by two interaction

# Create new variables, for all interactions, including three two # by two interaction terms, as well as the single interaction term # with all three variables. st_tc <- st*tc st_mc <- st*mc tc_mc <- tc*mc st_tc_mc <- st*tc*mc # Create a data frame with all data: fat <- data. frame(st, tc, mc, st_tc, st_mc, tc_mc, st_tc_mc, bf) > fat st tc mc st_tc st_mc tc_mc st_tc_mc bf 1 19. 5 43. 1 29. 1 840. 45 567. 45 1254. 21 24457. 10 11. 9 2 24. 7 49. 8 28. 2 1230. 06 696. 54 1404. 36 34687. 69 22. 8 3 30. 7 51. 9 37. 0 1593. 33 1135. 90 1920. 30 58953. 21 18. 7 4 29. 8 54. 3 31. 1 1618. 14 926. 78 1688. 73 50324. 15 20. 1 5 19. 1 42. 2 30. 9 806. 02 590. 19 1303. 98 24906. 02 12. 9 30

> # Look at descriptive statistics for all data. > summary(fat) st tc mc

> # Look at descriptive statistics for all data. > summary(fat) st tc mc Min. : 14. 60 Min. : 42. 20 Min. : 21. 30 1 st Qu. : 21. 50 1 st Qu. : 47. 77 1 st Qu. : 24. 75 Median : 25. 55 Median : 52. 00 Median : 27. 90 Mean : 25. 30 Mean : 51. 17 Mean : 27. 62 3 rd Qu. : 29. 90 3 rd Qu. : 54. 62 3 rd Qu. : 30. 02 Max. : 31. 40 Max. : 58. 60 Max. : 37. 00 st_mc tc_mc st_tc_mc Min. : 311. 0 Min. : 909. 5 Min. : 13279 1 st Qu. : 584. 5 1 st Qu. : 1274. 1 1 st Qu. : 25415 Median : 694. 8 Median : 1403. 4 Median : 35015 Mean : 706. 9 Mean : 1414. 9 Mean : 36830 3 rd Qu. : 861. 9 3 rd Qu. : 1607. 1 3 rd Qu. : 48423 Max. : 1135. 9 Max. : 1920. 3 Max. : 58953 st_tc Min. : 623. 4 1 st Qu. : 1038. 3 Median : 1372. 0 Mean : 1317. 9 3 rd Qu. : 1608. 1 Max. : 1836. 9 bf Min. : 11. 70 1 st Qu. : 17. 05 Median : 21. 20 Mean : 20. 20 3 rd Qu. : 24. 27 Max. : 27. 20 31

# Look at scatter plots for each variable. pairs(fat) 32

# Look at scatter plots for each variable. pairs(fat) 32

> # Calculate a correlation matrix for all variables. > cor(fat) st tc mc

> # Calculate a correlation matrix for all variables. > cor(fat) st tc mc st_tc st_mc tc_mc st_tc_mc st 1. 0000000 0. 9238425 0. 4577772 0. 9887843 0. 9003214 0. 8907135 0. 9649137 tc 0. 9238425 1. 0000000 0. 0846675 0. 9663436 0. 6719665 0. 6536065 0. 8062687 mc 0. 4577772 0. 0846675 1. 0000000 0. 3323920 0. 7877028 0. 8064087 0. 6453482 st_tc 0. 9887843 0. 9663436 0. 3323920 1. 0000000 0. 8344518 0. 8218605 0. 9277172 st_mc 0. 9003214 0. 6719665 0. 7877028 0. 8344518 1. 0000000 0. 9983585 0. 9778029 tc_mc 0. 8907135 0. 6536065 0. 8064087 0. 8218605 0. 9983585 1. 0000000 0. 9710983 st_tc_mc 0. 9649137 0. 8062687 0. 6453482 0. 9277172 0. 9778029 0. 9710983 1. 0000000 bf 0. 8432654 0. 8780896 0. 1424440 0. 8697087 0. 6339052 0. 6237307 0. 7418017 bf st 0. 8432654 tc 0. 8780896 mc 0. 1424440 st_tc 0. 8697087 st_mc 0. 6339052 tc_mc 0. 6237307 st_tc_mc 0. 7418017 bf 1. 0000000 33

 • Looking at the scatter plots and correlation matrix, we see trouble. Many

• Looking at the scatter plots and correlation matrix, we see trouble. Many of the correlations between the independent variables are very high, which will cause severe confounding and/or near collinearity. The problem is particularly acute among the interaction variables we created. • Trick that sometimes helps: Subtract the mean from each independent variable, and use these so-called “centered” variables to create the interaction variables. This will not change the correlations among the non-interaction terms, but may reduce correlations for interaction terms. 34

# Create the centered independent variables: st. c <- st - mean(st) tc. c

# Create the centered independent variables: st. c <- st - mean(st) tc. c <- tc - mean(tc) mc. c <- mc - mean(mc) # Now create the centered interaction terms: st_tc. c <- st. c*tc. c st_mc. c <- st. c*mc. c tc_mc. c <- tc. c*mc. c st_tc_mc. c <- st. c*tc. c*mc. c # Create a new data frame with this new set of independent variables fat. c <- data. frame(st. c, tc. c, mc. c, st_tc. c, st_mc. c, tc_mc. c, st_tc_mc. c, bf) > head(fat. c) st. c tc. c mc. c st_tc. c st_mc. c tc_mc. c st_tc_mc. c bf 1 -5. 805 -8. 07 1. 48 46. 84635 -8. 5914 -11. 9436 69. 332598 11. 9 2 -0. 605 -1. 37 0. 58 0. 82885 -0. 3509 -0. 7946 0. 480733 22. 8 3 5. 395 0. 73 9. 38 3. 93835 50. 6051 6. 8474 36. 941723 18. 7 4 4. 495 3. 13 3. 48 14. 06935 15. 6426 10. 8924 48. 961338 20. 1 5 -6. 205 -8. 97 3. 28 55. 65885 -20. 3524 -29. 4216 182. 561028 12. 9 6 0. 295 2. 73 -3. 92 0. 80535 -1. 1564 -10. 7016 -3. 156972 21. 7 35

> # Look at the new correlation matrix > cor(fat. c) st. c tc.

> # Look at the new correlation matrix > cor(fat. c) st. c tc. c mc. c st. c 1. 0000000 0. 9238425 0. 45777716 tc. c 0. 9238425 1. 0000000 0. 08466750 mc. c 0. 4577772 0. 0846675 1. 0000 st_tc. c -0. 4770137 -0. 4297883 -0. 21589210 st_mc. c -0. 1734155 -0. 1725368 -0. 03040675 tc_mc. c -0. 2215706 -0. 1436553 -0. 23536583 st_tc_mc. c 0. 4241959 0. 2054264 0. 62212493 bf 0. 8432654 0. 8780896 0. 14244403 st_tc_mc. c bf st. c 0. 4241959 0. 8432654 tc. c 0. 2054264 0. 8780896 mc. c 0. 6221249 0. 1424440 st_tc. c -0. 4975292 -0. 3923247 st_mc. c -0. 6721502 -0. 2511331 tc_mc. c -0. 7398958 -0. 1657072 st_tc_mc. c 1. 0000000 0. 2435352 bf 0. 2435352 1. 0000000 st_tc. c -0. 4770137 -0. 4297883 -0. 2158921 1. 0000000 0. 2328290 0. 2919073 -0. 4975292 -0. 3923247 st_mc. c -0. 17341554 -0. 17253677 -0. 03040675 0. 23282905 1. 0000 0. 89050954 -0. 67215024 -0. 25113314 tc_mc. c -0. 2215706 -0. 1436553 -0. 2353658 0. 2919073 0. 8905095 1. 0000000 -0. 7398958 -0. 1657072 36

> # Calculate a simple linear regression for each variable (not the interactions). >

> # Calculate a simple linear regression for each variable (not the interactions). > regression 1. out <- lm(bf ~ st. c) > regression 2. out <- lm(bf ~ tc. c) > regression 3. out <- lm(bf ~ mc. c) > summary(regression 1. out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0. 6305 0. 8572 0. 1288 st. c 20. 1950 32. 029 < 2 e-16 *** 6. 656 3. 02 e-06 *** > summary(regression 2. out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0. 5613 35. 979 < 2 e-16 *** 0. 8565 0. 1100 7. 786 3. 6 e-07 *** tc. c 20. 1950 > summary(regression 3. out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) mc. c 20. 1950 1. 1611 0. 1994 0. 3266 17. 393 1. 06 e-12 *** 0. 611 0. 549 37

# Two of the three variables seem to have a strong effect, # but

# Two of the three variables seem to have a strong effect, # but effect of mc. c is inconclusive (NOT NEGATIVE!!) # Calculate a multiple linear regression for all variables, # without interactions. regression 4. out <- lm(bf ~ st. c + tc. c + mc. c) summary(regression 4. out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20. 1950 0. 5545 36. 418 <2 e-16 *** st. c 4. 3341 3. 0155 1. 437 0. 170 tc. c -2. 8568 2. 5820 -1. 106 0. 285 mc. c -2. 1861 1. 5955 -1. 370 0. 190 • Compared to the univariate results, we see many changes, because of high confounding between st. c and tc. c. Since they provide such similar information, we will drop tc. c. 38

regression 5. out <- lm(bf ~ st. c + mc. c) summary(regression 5. out)

regression 5. out <- lm(bf ~ st. c + mc. c) summary(regression 5. out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20. 1950 0. 5582 st. c 1. 0006 0. 1282 mc. c -0. 4314 0. 1766 36. 180 < 2 e-16 *** 7. 803 5. 12 e-07 *** -2. 443 0. 0258 * # Much better result, note how much narrower CI’s are, both # variables have at least a small effect, likely of clinical interest. # Add in the interaction between st. c and mc. c regression 6. out <- lm(bf ~ st. c + mc. c + st_mc. c) summary(regression 6. out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20. 37496 0. 60663 33. 587 2. 89 e-16 *** st. c 0. 98153 0. 13171 7. 452 1. 37 e-06 *** mc. c -0. 42338 0. 17875 -2. 369 0. 0308 * st_mc. c -0. 02259 0. 02803 -0. 806 0. 4321 There is no strong evidence of an interaction 39

PLOTTING INTERACTION EFFECT* #install. packages("car") #An extremely useful/in-depth regression package #install. packages("stargazer") #Produces easy

PLOTTING INTERACTION EFFECT* #install. packages("car") #An extremely useful/in-depth regression package #install. packages("stargazer") #Produces easy to read regression results (similar to what you get in SPSS) #install. packages("effects") #We will use this to create our interactions #install. packages("ggplot 2") #Our incredibly powerful and versatile graphing package * http: //ademos. people. uic. edu/Chapter 13. html 40

PLOTTING INTERACTION EFFECT • Let’s create a dataset. library(car) #Even though we already installed

PLOTTING INTERACTION EFFECT • Let’s create a dataset. library(car) #Even though we already installed "car", we have to tell R we want it to load this package for us to use #You can choose whatever # you want for the seed; this is for randomization of your data set. seed(150) #Let's make our data set will have 250 participants (n), perhaps college students! n <- 250 #Uniform distribution of work ethic (X) from 1 -5 (1 = poor work ethic, 5 = great work ethic) X <- rnorm(n, 2. 75, . 75) #We want a normal distribution of IQ (Z) #I fixed the mean of IQ to 15 so that the regression equation works realistically, SD = 15 Z <- rnorm(n, 15) #We then create Y using a regression equation (adding a bit of random noise) Y <-. 7*X +. 3*Z + 2. 5*X*Z + rnorm(n, sd = 5) #This code is here so that Y (GPA) is capped at 4. 0 (the logical max for GPA) Y = (Y - min(Y)) / (max(Y) - min(Y))*4 #Finally, we put our data together with the data. frame() function GPA. Data <- data. frame(GPA=Y, Work. Ethic=X, IQ=Z) 41

 • Center your independent variables • To avoid problems of multicollinearity! When a

• Center your independent variables • To avoid problems of multicollinearity! When a model has multicollinearity, it doesn’t know which term to give the variance to • When we center our independent variables, the center of each independent variable represents the mean • When you interact X * Z, you are adding a new predictor (XZ) that strongly correlates with X and Z • If you center your variables, you will now have a U-shaped interaction that is orthogonal to X and Z • Exceptions: Don’t center physical data or when there is a true, meaningful 0 GPA. Data$IQ. C <- scale(GPA. Data$IQ, center = TRUE, scale = FALSE)[, ] GPA. Data$Work. Ethic. C <- scale(GPA. Data$Work. Ethic, center = TRUE, scale = FALSE)[, ] 42

Run the model: GPA. Model. 1 <- lm(GPA~IQ. C+Work. Ethic. C, GPA. Data) GPA.

Run the model: GPA. Model. 1 <- lm(GPA~IQ. C+Work. Ethic. C, GPA. Data) GPA. Model. 2 <- lm (GPA~IQ. C*Work. Ethic. C, GPA. Data) library(stargazer) stargazer(GPA. Model. 1, GPA. Model. 2, type="html", column. labels = c("Main Effects", "Interaction"), intercept. bottom = FALSE, single. row=FALSE, notes. append = FALSE, header=FALSE) 43

Dependent variable: GPA Constant IQ. C Work. Ethic. C Main Effects Interaction (1) (2)

Dependent variable: GPA Constant IQ. C Work. Ethic. C Main Effects Interaction (1) (2) 2. 054*** (0. 008) (0. 002) 0. 041*** 0. 040*** (0. 001) (0. 0001) 0. 199*** 0. 202*** (0. 012) (0. 002) IQ. C: Work. Ethic. C 0. 014*** (0. 0002) Observations 250 R 2 0. 959 0. 998 Adjusted R 2 0. 959 0. 998 Residual Std. Error 0. 134 (df = 247) 0. 026 (df = 246) F Statistic 2, 888. 028*** (df = 2; 247) 51, 713. 400*** (df = 3; 246) Note: p<0. 1; p<0. 05; p<0. 01 44

Plotting simple slopes: Hand Picking • • Hand picking is useful if you have

Plotting simple slopes: Hand Picking • • Hand picking is useful if you have specific predictions in your data set If you are working with IQ, a drug, or age - numbers are relevant and are useful to pick! For our example, let’s go with -15, 0, 15 for our centered IQ (1 SD above and below mean) c() will give you the exact values and seq() will give you a range from a to b, increasing by c library(effects) #Run the interaction Inter. Hand. Pick <- effect('IQ. C*Work. Ethic. C', GPA. Model. 2, xlevels=list(IQ. C = c(-15, 0, 15), Work. Ethic. C = c(-1. 1, 0, 1. 1)), se=TRUE, confidence. level=. 95, typical=mean) #Put data in data frame Inter. Hand. Pick <- as. data. frame(Inter. Hand. Pick) #Check out what the "head" (first 6 rows) of your data looks like head(Inter. Hand. Pick) IQ. C Work. Ethic. C fit se lower upper 1 -15 -1. 1 1. 464610 0. 004670705 1. 455410 1. 473809 2 0 -1. 1 1. 831723 0. 003040883 1. 825734 1. 837713 3 15 -1. 1 2. 198836 0. 004717661 2. 189544 2. 208129 4 -15 0. 0 1. 460340 0. 002350381 1. 455711 1. 464970 5 0 0. 0 2. 054450 0. 001663308 2. 051174 2. 057726 6 15 0. 0 2. 648560 0. 002348376 2. 643934 2. 653185 45

#Create a factor of the IQ variable used in the interaction Inter. Hand. Pick$IQ

#Create a factor of the IQ variable used in the interaction Inter. Hand. Pick$IQ <- factor(Inter. Hand. Pick$IQ. C, levels=c(-15, 0, 15), labels=c("1 SD Below Population Mean", "1 SD Above Population Mean")) #Create a factor of the Work Ethic variable used in the interaction Inter. Hand. Pick$Work. Ethic <- factor(Inter. Hand. Pick$Work. Ethic. C, levels=c(-1. 1, 0, 1. 1), labels=c("Poor Worker", "Average Worker", "Hard Worker")) library(ggplot 2) Plot. Hand. Pick<-ggplot(data=Inter. Hand. Pick, aes(x=Work. Ethic, y=fit, group=IQ))+ geom_line(size=2, aes(color=IQ))+ ylim(0, 4)+ ylab("GPA")+ xlab("Work Ethic")+ ggtitle("Hand Picked Plot") Plot. Hand. Pick 46

We can visualize the fact that for smart people (1 SD above the population

We can visualize the fact that for smart people (1 SD above the population mean (not determined by our data set), as their work ethic increases, so does their GPA. A similar pattern is seen for people with average IQs, though the effect is not nearly as strong. For people 1 SD below the population mean on IQ, as their work ethic increases, it appears as though their GPA actually decreases. 47