Multiple Regression Multiple Regression The test you choose

  • Slides: 51
Download presentation
Multiple Regression

Multiple Regression

Multiple Regression The test you choose depends on level of measurement: Independent Variable Dependent

Multiple Regression The test you choose depends on level of measurement: Independent Variable Dependent Variable Test Dichotomous Interval-Ratio Dichotomous Independent Samples t-test Nominal Dichotomous Cross Tabs Nominal Dichotomous Interval-Ratio Dichotomous ANOVA Interval-Ratio Dichotomous Interval-Ratio Bivariate Regression/Correlation Interval-Ratio Multiple Regression Two or More… Interval-Ratio Dichotomous

Multiple Regression o Multiple Regression is very popular among sociologists. n n n Most

Multiple Regression o Multiple Regression is very popular among sociologists. n n n Most social phenomena have more than one cause. It is very difficult to manipulate just one social variable through experimentation. Sociologists must attempt to model complex social realities to explain them.

Multiple Regression o Multiple Regression allows us to: n n Use several variables at

Multiple Regression o Multiple Regression allows us to: n n Use several variables at once to explain the variation in a continuous dependent variable. Isolate the unique effect of one variable on the continuous dependent variable while taking into consideration that other variables are affecting it too. Write a mathematical equation that tells us the overall effects of several variables together and the unique effects of each on a continuous dependent variable. Control for other variables to demonstrate whether bivariate relationships are spurious

Multiple Regression o For example: A sociologist may be interested in the relationship between

Multiple Regression o For example: A sociologist may be interested in the relationship between Education and Income and Number of Children in a family. Independent Variables Dependent Variable Education Number of Children Family Income

Multiple Regression o For example: n Null Hypothesis: There is no relationship between education

Multiple Regression o For example: n Null Hypothesis: There is no relationship between education of respondents and the number of children in families. Ho : b 1 = 0 n Null Hypothesis: There is no relationship between family income and the number of children in families. Ho : b 2 = 0 Independent Variables Dependent Variable Education Number of Children Family Income

Multiple Regression o o Bivariate regression is based on fitting a line as close

Multiple Regression o o Bivariate regression is based on fitting a line as close as possible to the plotted coordinates of your data on a two-dimensional graph. Trivariate regression is based on fitting a plane as close as possible to the plotted coordinates of your data on a three-dimensional graph. Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Children (Y): 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6 Education (X 1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 Income 1=$10 K (X 2): 3 4 9 5 4 12 10 1 4 7 3 10 4 9 4 4 12 10 6 4 8 12 10 20 11 9 1 10 3 9 2 4

Multiple Regression Plotted coordinates (1 – 10) for Education, Income and Number of Children

Multiple Regression Plotted coordinates (1 – 10) for Education, Income and Number of Children Y 0 X 2 X 1 Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Children (Y): 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6 Education (X 1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 Income 1=$10 K (X 2): 3 4 9 5 4 12 10 1 4 7 3 10 4 9 4 4 12 10 6 4 8 12 10 20 11 9 1 10 3 9 2 4

Multiple Regression What multiple regression does is fit a plane to these coordinates. Y

Multiple Regression What multiple regression does is fit a plane to these coordinates. Y 0 X 2 X 1 Case: 1 2 3 4 5 6 7 8 9 10 Children (Y): 2 5 1 9 6 3 0 3 7 Education (X 1) 12 16 2012 9 18 16 14 9 12 Income 1=$10 K (X 2): 3 4 9 5 4 12 10 1 4 7 3

Multiple Regression o Mathematically, that plane is: Y = a + b 1 X

Multiple Regression o Mathematically, that plane is: Y = a + b 1 X 1 + b 2 X 2 a = y-intercept, where X’s equal zero b=coefficient or slope for each variable For our problem, SPSS says the equation is: Y = 11. 8 -. 36 X 1 -. 40 X 2 Expected # of Children = 11. 8 -. 36*Educ -. 40*Income

Muliple Regression Conducting a Test of Significance for the slopes of the Regression Shape

Muliple Regression Conducting a Test of Significance for the slopes of the Regression Shape By slapping the sampling distribution for the slopes over a guess of the population’s slopes, Ho, we can find out whether our sample could have been drawn from a population where the slopes are equal to our guess. 1. 2. 3. 4. 5. 6. 7. Two-tailed significance test for -level =. 05 Critical t = +/- 1. 96 To find if there is a significant slope in the population, Ho : 1 = 0 ; 2 = 0 Ha : 1 0 ; 2 0 ( Y – Y )2 Collect Data n-2 Calculate t (z): t = b – o s. e. = (for each) s. e. ( X – X )2 Make decision about the null hypotheses Find P-values

Multiple Regression Y = 11. 8 -. 36 X 1 -. 40 X 2

Multiple Regression Y = 11. 8 -. 36 X 1 -. 40 X 2 Sig. Tests t-scores and P-values

Multiple Regression o R 2 n TSS – SSE / TSS o o n

Multiple Regression o R 2 n TSS – SSE / TSS o o n n TSS = Distance from mean to value on Y for each case SSE = Distance from shape to value on Y for each case Can be interpreted the same for multiple regression—joint explanatory value of all of your variables (or “your model”) Can request a change in R 2 test from SPSS to see if adding new variables improves the fit of your model

Multiple Regression 57% of the variation in number of children is explained by education

Multiple Regression 57% of the variation in number of children is explained by education and income! Y = 11. 8 -. 36 X 1 -. 40 X 2

Multiple Regression r 2 (Y – - (Y – Y)2 Y)2 Y = 11.

Multiple Regression r 2 (Y – - (Y – Y)2 Y)2 Y = 11. 8 -. 36 X 1 -. 40 X 2 161. 518 ÷ 261. 76 =. 573

Multiple Regression So what does our equation tell us? Y = 11. 8 -.

Multiple Regression So what does our equation tell us? Y = 11. 8 -. 36 X 1 -. 40 X 2 Expected # of Children = 11. 8 -. 36*Educ -. 40*Income Try “plugging in” some values for your variables.

Multiple Regression So what does our equation tell us? ^ Y = 11. 8

Multiple Regression So what does our equation tell us? ^ Y = 11. 8 -. 36 X 1 -. 40 X 2 Expected # of Children = 11. 8 -. 36*Educ -. 40*Income If Education equals: & If Income Equals: 0 0 10 10 20 11 Then, children equals: 11. 8 8. 2 4. 2 0. 6 0. 2

Multiple Regression So what does our equation tell us? ^ Y = 11. 8

Multiple Regression So what does our equation tell us? ^ Y = 11. 8 -. 36 X 1 -. 40 X 2 Expected # of Children = 11. 8 -. 36*Educ -. 40*Income If Education equals: & If Income Equals: 1 0 1 1 1 5 1 10 1 15 Then, children equals: 11. 44 11. 04 9. 44 7. 44 5. 44

Multiple Regression So what does our equation tell us? ^ Y = 11. 8

Multiple Regression So what does our equation tell us? ^ Y = 11. 8 -. 36 X 1 -. 40 X 2 Expected # of Children = 11. 8 -. 36*Educ -. 40*Income If Education equals: & If Income Equals: 0 1 1 1 5 1 10 1 15 1 Then, children equals: 11. 40 11. 04 9. 60 7. 80 6. 00

Multiple Regression If graphed, holding one variable constant produces a twodimensional graph for the

Multiple Regression If graphed, holding one variable constant produces a twodimensional graph for the other variable. Y 11. 40 Y 11. 44 b = -. 36 b = -. 4 6. 00 0 15 X 1 = Education 5. 44 0 X 2 = Income 15

Multiple Regression o o An interesting effect of controlling for other variables is “Simpson’s

Multiple Regression o o An interesting effect of controlling for other variables is “Simpson’s Paradox. ” The direction of relationship between two variables can change when you control for another variable. Education + Crime Rate Y = -51. 3 + 1. 5 X

Multiple Regression o “Simpson’s Paradox” + Education Crime Rate + Urbanization (is related to

Multiple Regression o “Simpson’s Paradox” + Education Crime Rate + Urbanization (is related to both) Y = -51. 3 + 1. 5 X 1 Education + Crime Rate Regression Controlling for Urbanization - Education Urbanization + Crime Rate Y = 58. 9 -. 6 X 1 +. 7 X 2

Multiple Regression Crime Original Regression Line Looking at each level of urbanization, new lines

Multiple Regression Crime Original Regression Line Looking at each level of urbanization, new lines Rural Small town Suburban Education City

Multiple Regression Now… More Variables! o The social world is very complex. o What

Multiple Regression Now… More Variables! o The social world is very complex. o What happens when you have even more variables? o For example: A sociologist may be interested in the effects of Education, Income, Sex, and Gender Attitudes on Number of Children in a family. Independent Variables Dependent Variable Education Family Income Sex Gender Attitudes Number of Children

Multiple Regression Null Hypotheses: o 1. 2. 3. 4. There will be no relationship

Multiple Regression Null Hypotheses: o 1. 2. 3. 4. There will be no relationship between education of respondents and the number of children in families. Ho : b 1 = 0 Ha : b 1 ≠ 0 There will be no relationship between family income and the number of children in families. Ho : b 2 = 0 Ha : b 2 ≠ 0 There will be no relationship between sex and number of children. Ho: b 3 = 0 Ha : b 3 ≠ 0 There will be no relationship between gender attitudes and number of children. Ho : b 4 = 0 Ha : b 4 ≠ 0 Independent Variables Dependent Variable Education Family Income Sex Gender Attitudes Number of Children

Multiple Regression o o o Bivariate regression is based on fitting a line as

Multiple Regression o o o Bivariate regression is based on fitting a line as close as possible to the plotted coordinates of your data on a two-dimensional graph. Trivariate regression is based on fitting a plane as close as possible to the plotted coordinates of your data on a three-dimensional graph. Regression with more than two independent variables is based on fitting a shape to your constellation of data on an multi-dimensional graph.

Multiple Regression o o Regression with more than two independent variables is based on

Multiple Regression o o Regression with more than two independent variables is based on fitting a shape to your constellation of data on an multi-dimensional graph. The shape will be placed so that it minimizes the distance (sum of squared errors) from the shape to every data point.

Multiple Regression o o o Regression with more than two independent variables is based

Multiple Regression o o o Regression with more than two independent variables is based on fitting a shape to your constellation of data on an multi-dimensional graph. The shape will be placed so that it minimizes the distance (sum of squared errors) from the shape to every data point. The shape is no longer a line, but if you hold all other variables constant, it is linear for each independent variable.

Multiple Regression Y Imagining a graph with four dimensions! Y Y 0 X 2

Multiple Regression Y Imagining a graph with four dimensions! Y Y 0 X 2 X 2 X 2 X 1 0 0 0 X 1 X 1

Multiple Regression For our problem, our equation could be: Y = 7. 5 -.

Multiple Regression For our problem, our equation could be: Y = 7. 5 -. 30 X 1 -. 40 X 2 + 0. 5 X 3 + 0. 25 X 4 E(Children) = 7. 5 -. 30*Educ -. 40*Income + 0. 5*Sex + 0. 25*Gender Att.

Multiple Regression So what does our equation tell us? ^ Y = 7. 5

Multiple Regression So what does our equation tell us? ^ Y = 7. 5 -. 30 X 1 -. 40 X 2 + 0. 5 X 3 + 0. 25 X 4 E(Children) = 7. 5 -. 30*Educ -. 40*Income + 0. 5*Sex + 0. 25*Gender Att. Education: Income: 10 5 10 10 10 5 Sex: 0 0 0 1 1 Gender Att: 0 5 5 0 5 Children: 2. 5 3. 75 1. 75 3. 0 4. 25

Multiple Regression Each variable, holding the other variables constant, has a linear, twodimensional graph

Multiple Regression Each variable, holding the other variables constant, has a linear, twodimensional graph of its relationship with the dependent variable. Here we hold every other variable constant at “zero. ” Y 7. 5 b = -. 3 b = -. 4 4. 5 3. 5 0 10 0 X 2 = Education X 1 = Income ^Y = 7. 5 -. 30 X 1 -. 40 X 2 + 0. 5 X 3 + 0. 25 X 4 10

Multiple Regression Each variable, holding the other variables constant, has a linear, twodimensional graph

Multiple Regression Each variable, holding the other variables constant, has a linear, twodimensional graph of its relationship with the dependent variable. Here we hold every other variable constant at “zero. ” Y 8 b =. 5 8. 75 b =. 25 Y 7. 5 0 1 0 5 X 3 = Sex X 4 = Gender Attitudes ^Y = 7. 5 -. 30 X 1 -. 40 X 2 + 0. 5 X 3 + 0. 25 X 4

Multiple Regression Okay, we’re almost through with regression!

Multiple Regression Okay, we’re almost through with regression!

Multiple Regression o Dummy Variables What are dummy variables? ! o They are simply

Multiple Regression o Dummy Variables What are dummy variables? ! o They are simply dichotomous variables that are entered into regression. They have 0 – 1 coding where 0 = absence of something and 1 = presence of something. E. g. , Female (0=M; 1=F) or Southern (0=Non-Southern; 1=Southern).

Multiple Regression Dummy Variables are especially nice because they allow us to use nominal

Multiple Regression Dummy Variables are especially nice because they allow us to use nominal variables in regression. A nominal variable has no rank or order, rendering the numerical coding scheme useless for regression. But YOU said we CAN’T do that!

Multiple Regression o The way you use nominal variables in regression is by converting

Multiple Regression o The way you use nominal variables in regression is by converting them to a series of dummy variables. Nomimal Variable Race 1 = White 2 = Black 3 = Other Recode into different Dummy Variables 1. White 0 = Not White; 1 = White 2. Black 0 = Not Black; 1 = Black 3. Other 0 = Not Other; 1 = Other

Multiple Regression The way you use nominal variables in regression is by converting them

Multiple Regression The way you use nominal variables in regression is by converting them to a series of dummy variables. Recode into different Nomimal Variable Dummy Variables Religion 1. Catholic 1 = Catholic 0 = Not Catholic; 1 = Catholic 2 = Protestant 2. Protestant 3 = Jewish 0 = Not Prot. ; 1 = Protestant 4 = Muslim 3. Jewish 5 = Other Religions 0 = Not Jewish; 1 = Jewish 4. Muslim 0 = Not Muslim; 1 = Muslim 5. Other Religions 0 = Not Other; 1 = Other Relig. o

Multiple Regression When you need to use a nominal variable in regression (like race),

Multiple Regression When you need to use a nominal variable in regression (like race), just convert it to a series of dummy variables. o When you enter the variables into your model, you MUST LEAVE OUT ONE OF THE DUMMIES. Leave Out One Enter Rest into Regression White Black Other o

Multiple Regression The reason you MUST LEAVE OUT ONE OF THE DUMMIES is that

Multiple Regression The reason you MUST LEAVE OUT ONE OF THE DUMMIES is that regression is mathematically impossible without an excluded group. o If all were in, holding one of them constant would prohibit variation in all the rest. Leave Out One Enter Rest into Regression Catholic Protestant Jewish Muslim Other Religion o

Multiple Regression o The regression equations for dummies will look the same. For Race,

Multiple Regression o The regression equations for dummies will look the same. For Race, with 3 dummies, predicting self-esteem: Y = a + b 1 X 1 + b 2 X 2 a = the y-intercept, which in this case is the predicted value of self-esteem for the excluded group, white. b 1 = the slope for variable X 1, black b 2 = the slope for variable X 2, other

Multiple Regression o If our equation were: For Race, with 3 dummies, predicting self-esteem:

Multiple Regression o If our equation were: For Race, with 3 dummies, predicting self-esteem: Plugging in values for the dummies tells you each group’s self-esteem average: Y = 28 + 5 X 1 – 2 X 2 a = the y-intercept, which in this case is the predicted value of self-esteem for the excluded group, white. 5 = the slope for variable X 1, black -2 = the slope for variable X 2, other White = 28 Black = 33 Other = 26 When cases’ values for X 1 = 0 and X 2 = 0, they are white; when X 1 = 1 and X 2 = 0, they are black; when X 1 = 0 and X 2 = 1, they are other.

Multiple Regression Dummy variables can be entered into multiple regression along with other dichotomous

Multiple Regression Dummy variables can be entered into multiple regression along with other dichotomous and continuous variables. o For example, you could regress self-esteem on sex, race, and education: X = Female Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + b 4 X 4 o 1 X 2 = Black How would you interpret this? Y = 30 – 4 X 1 + 5 X 2 – 2 X 3 + 0. 3 X 4 X 3 = Other X 4 = Education

Multiple Regression How would you interpret this? Y = 30 – 4 X 1

Multiple Regression How would you interpret this? Y = 30 – 4 X 1 + 5 X 2 – 2 X 3 + 0. 3 X 4 X 1 = Female X 2 = Black X 3 = Other X 4 = Education 1. 2. 3. 4. Women’s self-esteem is 4 points lower than men’s. Blacks’ self-esteem is 5 points higher than whites’. Others’ self-esteem is 2 points lower than whites’ and consequently 7 points lower than blacks’. Each year of education improves self-esteem by 0. 3 units.

Multiple Regression How would you interpret this? Y = 30 – 4 X 1

Multiple Regression How would you interpret this? Y = 30 – 4 X 1 + 5 X 2 – 2 X 3 + 0. 3 X 4 X 1 = Female X 2 = Black X 3 = Other X 4 = Education Plugging in some select values, we’d get self-esteem for select groups: o White males with 10 years of education = 33 o Black males with 10 years of education = 38 o Other females with 10 years of education = 27 o Other females with 16 years of education = 28. 8

Multiple Regression How would you interpret this? Y = 30 – 4 X 1

Multiple Regression How would you interpret this? Y = 30 – 4 X 1 + 5 X 2 – 2 X 3 + 0. 3 X 4 X 1 = Female X 2 = Black X 3 = Other X 4 = Education The same regression rules apply. The slopes represent the linear relationship of each independent variable in relation to the dependent while holding all other variables constant. Make sure you get into the habit of saying the slope is the effect of an independent variable on the dependent variable “while holding everything else constant. ”

Multiple Regression Standardized Coefficients o Sometimes you want to know whether one variable has

Multiple Regression Standardized Coefficients o Sometimes you want to know whether one variable has a larger impact on your dependent variable than another. o If your variables have different units of measure, it is hard to compare their effects. o For example, if wages go up one thousand dollars for each year of education, is that a greater effect than if wages go up five hundred dollars for each year increase in age.

Multiple Regression Standardized Coefficients o So which is better for increasing wages, education or

Multiple Regression Standardized Coefficients o So which is better for increasing wages, education or aging? o One thing you can do is “standardize” your slopes so that you can compare the standard deviation increase in your dependent variable for each standard deviation increase in your independent variables. o You might find that Wages go up 0. 3 standard deviations for each standard deviation increase in education, but 0. 4 standard deviations for each standard deviation increase in age.

Multiple Regression Standardized Coefficients o Recall that standardizing regression coefficients is accomplished by the

Multiple Regression Standardized Coefficients o Recall that standardizing regression coefficients is accomplished by the formula: b(Sx/Sy) o In the example above, education and income have very comparable effects on number of children. Each lowers the number of children by. 4 standard deviations for a standard deviation increase in each, controlling for the other. o

Multiple Regression Standardized Coefficients o One last note of caution. . . n n

Multiple Regression Standardized Coefficients o One last note of caution. . . n n It does not make sense to standardize slopes for dichotomous variables. It makes no sense to refer to standard deviation increases in sex, or in race--these are either 0 or they are 1 only.

Multiple Regression Give yourself a hand… You now understand more statistics that 99% of

Multiple Regression Give yourself a hand… You now understand more statistics that 99% of the population! You are well-qualified for understanding most sociological research papers.