STAT 250 Dr Kari Lock Morgan Multiple Regression

  • Slides: 38
Download presentation
STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10. 1, 10. 3 (?

STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10. 1, 10. 3 (? ) • Multiple explanatory variables (10. 1, 10. 3) Statistics: Unlocking the Power of Data Lock 5

More than 2 variables! • Today we’ll finally learn a way to handle more

More than 2 variables! • Today we’ll finally learn a way to handle more than 2 variables! Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5

Multiple Regression • Multiple regression extends simple linear regression to include multiple explanatory variables:

Multiple Regression • Multiple regression extends simple linear regression to include multiple explanatory variables: • Each x is a different explanatory variable • k is the number of explanatory variables Statistics: Unlocking the Power of Data Lock 5

Predicting Body Fat Percentage �The percentage of a person’s weight that is made up

Predicting Body Fat Percentage �The percentage of a person’s weight that is made up of body fat is often used as an indicator of health and fitness �Accurate measures of percent body fat at difficult to implement �For example, you can immerse the body in water to estimate density, then apply a formula �Another option: build a model predicting % body fat based on easy to obtain measurements Statistics: Unlocking the Power of Data Lock 5

Body Fat Data �Measurements were collected on 100 men �Response variable: percent body fat

Body Fat Data �Measurements were collected on 100 men �Response variable: percent body fat �Explanatory variables: Age (in years) Weight (in pounds) Height (in inches) Neck circumference (in cm) Chest circumference (in cm) Abdomen circumference (in cm) Ankle circumference (in cm) Biceps circumference (in cm) Wrist circumference (in cm) A sample taken from data provided by Johnson R. , "Fitting Percentage of Body Fat to Simple Body Measurements, " Journal of Statistics Education, 1996, Statistics: Unlocking the Power of Data Lock 5

Predicting Percent Body Fat �We’ll start with just three explanatory variables and fit the

Predicting Percent Body Fat �We’ll start with just three explanatory variables and fit the model: Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight - 1. 117 Height �What can we do with this? Make predictions Interpret coefficients Inference Interpret R 2 and much, more! Statistics: Unlocking the Power of Data Lock 5

Making Predictions �Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight

Making Predictions �Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight - 1. 117 Height �If you are male, you can use this to predict your percent body fat! �Age: years, weight: pounds, height: inches Statistics: Unlocking the Power of Data Lock 5

Percent Body Fat Statistics: Unlocking the Power of Data Lock 5

Percent Body Fat Statistics: Unlocking the Power of Data Lock 5

Interpreting Coefficients �Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight

Interpreting Coefficients �Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight - 1. 117 Height �Intercept: a man 0 years old, weighs 0 lbs, and is 0 inches tall would have 49. 6% body fat �Slope: Keeping weight and height constant, percent body fat increases by 0. 1653 for every additional year �Keeping age and height constant, percent body fat increases by 0. 2264 for every additional pound Statistics: Unlocking the Power of Data Lock 5

Interpreting Coefficients Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight

Interpreting Coefficients Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight 1. 117 Height Which of the following is a correct interpretation? a) Keeping age and weight constant, height decreases by 1. 117 for every additional percent of body fat b) Keeping age and weight constant, percent body fat decreases by 1. 117 for every additional inch c) Predicted body fat decreases by 1. 117 for every additional inch Statistics: Unlocking the Power of Data Lock 5

Inference �Are our explanatory variables significant predictors? �All of the p-values corresponding to the

Inference �Are our explanatory variables significant predictors? �All of the p-values corresponding to the explanatory variables are very small �Age, weight, and height are all significant predictors of percent body fat (given the other variables in the model) Statistics: Unlocking the Power of Data Lock 5

Explaining Variability How much of the variability in percent body fat is explained by

Explaining Variability How much of the variability in percent body fat is explained by this model? Which of the following would tell us this? a) p-value b) correlation c) slope coefficients d) R 2 e) confidence interval Statistics: Unlocking the Power of Data Lock 5

Explaining Variability �About 55% of the variability in percent body fat is explained by

Explaining Variability �About 55% of the variability in percent body fat is explained by age, weight, and height �Can we do better? Statistics: Unlocking the Power of Data Lock 5

Comparing with BMI �BMI is used more commonly than percent body fat because it

Comparing with BMI �BMI is used more commonly than percent body fat because it is easy to calculate �Currently, our predicted percent body fat is not using much more information than BMI (just age as an extra predictor) �What’s wrong with body mass index (BMI) as a indicator of health and fitness? �How might we improve our model to fix this problem? Statistics: Unlocking the Power of Data Lock 5

New Model �Bodyfat = -55. 9 + 0. 0067 Age - 0. 1724 Weight

New Model �Bodyfat = -55. 9 + 0. 0067 Age - 0. 1724 Weight + 0. 099 Height + 1. 066 Abdomen �Anything look odd about this equation? ? ? �Model without Abdomen: Bodyfat = 49. 6 + 0. 1653 Age + 0. 2264 Weight - 1. 117 Height Statistics: Unlocking the Power of Data Lock 5

Significance Which explanatory variable(s) are significant? a) All of them – age, weight, height,

Significance Which explanatory variable(s) are significant? a) All of them – age, weight, height, abdomen b) Weight and height c) Weight, height, abdomen d) Weight and abdomen e) Abdomen only Statistics: Unlocking the Power of Data Lock 5

Multiple Regression • The coefficient for each explanatory variable is the predicted change in

Multiple Regression • The coefficient for each explanatory variable is the predicted change in y for one unit change in x, given the other explanatory variables in the model! • The p-value for each coefficient indicates whether it is a significant predictor of y, given the other explanatory variables in the model! • If explanatory variables are associated with each other, coefficients and p-values will change depending on what else is included in the model Statistics: Unlocking the Power of Data Lock 5

Full Model Statistics: Unlocking the Power of Data Lock 5

Full Model Statistics: Unlocking the Power of Data Lock 5

Which explanatory variable(s) are significant? a) All of them b) Weight and abdomen c)

Which explanatory variable(s) are significant? a) All of them b) Weight and abdomen c) Neck only d) Abdomen and wrist Statistics: Unlocking the Power of Data Lock 5

Insignificant Terms �What should we do with the insignificant variables? �Keep them in the

Insignificant Terms �What should we do with the insignificant variables? �Keep them in the model? Take them out? ? �Deciding which variables to keep in the model (variable selection) is an entire subfield of statistics, and beyond the scope of this class �Want to learn more about it? Take STAT 462! Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy • Cases: countries of the world • Response variable: life

Electricity and Life Expectancy • Cases: countries of the world • Response variable: life expectancy • Explanatory variable: electricity use (k. Wh per capita) • Is a country’s electricity use helpful in predicting life expectancy? Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Is a country’s electricity use helpful in predicting life expectancy?

Electricity and Life Expectancy Is a country’s electricity use helpful in predicting life expectancy? (a) Yes (b) No Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy If we increased electricity use in a country, would life

Electricity and Life Expectancy If we increased electricity use in a country, would life expectancy increase? (a) Yes (b) No (c) Impossible to tell Statistics: Unlocking the Power of Data Lock 5

Confounding Variables • Wealth is an obvious confounding variable that could explain the relationship

Confounding Variables • Wealth is an obvious confounding variable that could explain the relationship between electricity use and life expectancy • Multiple regression is a powerful tool that allows us to account for confounding variables • We can see whether an explanatory variable is still significant, even after including potential confounding variables in the model Statistics: Unlocking the Power of Data Lock 5

Electricity and Life Expectancy Is a country’s electricity use helpful in predicting life expectancy,

Electricity and Life Expectancy Is a country’s electricity use helpful in predicting life expectancy, even after including GDP in the model? (a) Yes (b) No Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy • Cases: countries of the world • Response variable:

Cell Phones and Life Expectancy • Cases: countries of the world • Response variable: life expectancy • Explanatory variable: number of mobile cellular subscriptions per 100 people • Is a country’s cell phone subscription rate helpful in predicting life expectancy? Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy Is a country’s number of cell phone subscriptions per

Cell Phones and Life Expectancy Is a country’s number of cell phone subscriptions per capita helpful in predicting life expectancy? (a) Yes (b) No Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy If we gave everyone in a country a cell

Cell Phones and Life Expectancy If we gave everyone in a country a cell phone and a cell phone subscription, would life expectancy in that country increase? (a) Yes (b) No (c) Impossible to tell Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy Is a country’s cell phone subscription rate helpful in

Cell Phones and Life Expectancy Is a country’s cell phone subscription rate helpful in predicting life expectancy, even after including GDP in the model? (a) Yes (b) No Statistics: Unlocking the Power of Data Lock 5

Cell Phones and Life Expectancy • This says that wealth alone can not explain

Cell Phones and Life Expectancy • This says that wealth alone can not explain the association between cell phone subscriptions and life expectancy • This suggests that either cell phones actually do something to increase life expectancy (causal) OR there is another confounding variable besides wealth of the country Statistics: Unlocking the Power of Data Lock 5

Confounding Variables • Multiple regression is one potential way to account for confounding variables

Confounding Variables • Multiple regression is one potential way to account for confounding variables • This is most commonly used in practice across a wide variety of fields, but is quite sensitive to the conditions for the linear model (particularly linearity) • You can only “rule out” confounding variables that you have data on, so it is still very hard to make true causal conclusions without a randomized experiment Statistics: Unlocking the Power of Data Lock 5

To Do �Read 10. 1 �Do SRTEs (by Sunday, 5/3) �Do HW 10. 1

To Do �Read 10. 1 �Do SRTEs (by Sunday, 5/3) �Do HW 10. 1 (due Friday, 5/1) �Do online assessments (due Friday, 5/1) Statistics: Unlocking the Power of Data Lock 5