Chapter 5 Regression 5192021 Chapter 5 1 Regression

  • Slides: 20
Download presentation
Chapter 5 Regression 5/19/2021 Chapter 5 1

Chapter 5 Regression 5/19/2021 Chapter 5 1

Regression • Like correlation, regression addresses the relationship between a quantitative explanatory variable (X)

Regression • Like correlation, regression addresses the relationship between a quantitative explanatory variable (X) and quantitative response variable (Y) • The objective of regression is to describe the best fitting line through the data • As with correlation, start by looking at the data with a scatterplot 5/19/2021 Chapter 5 2

Same data as last week Country Austria Belgium Finland France Germany Ireland Italy Netherlands

Same data as last week Country Austria Belgium Finland France Germany Ireland Italy Netherlands Switzerland UK 5/19/2021 Per Capita GDP Life Expectancy X Y 21. 4 23. 2 20. 0 22. 7 20. 8 18. 6 21. 5 22. 0 23. 8 21. 2 Chapter 5 77. 48 77. 53 77. 32 78. 63 77. 17 76. 39 78. 51 78. 15 78. 99 77. 37 3

Inspect scatterplot for linearity 5/19/2021 Chapter 5 4

Inspect scatterplot for linearity 5/19/2021 Chapter 5 4

The Regression Line The regression line predicts values of Y with this equation (the

The Regression Line The regression line predicts values of Y with this equation (the “regression model”): ŷ = a + b∙X where: ŷ ≡ predicted value of Y at given X a ≡ intercept b = slope a and b are called regression coefficients 5/19/2021 Chapter 5 5

Calculation of slope & intercept 5/19/2021 Chapter 5 6

Calculation of slope & intercept 5/19/2021 Chapter 5 6

Example: calculation of regression coefficients Last week we calculated: Therefore: ŷ = a +

Example: calculation of regression coefficients Last week we calculated: Therefore: ŷ = a + b∙X = 68. 716 + 0. 420∙X 5/19/2021 Chapter 5 7

Regression Coefficients by Calculator This course supports the TI-30 IIS. Other calculators are acceptable

Regression Coefficients by Calculator This course supports the TI-30 IIS. Other calculators are acceptable but are not supported by the instructor. BEWARE! The TI-30 XIIS mislabels the slope & intercept. The slope is mislabeled as a and the intercept is mislabeled as b. It should be the other way around! 5/19/2021 Chapter 5 8

Interpretation of Slope b • The slope predicts the increase in Y per unit

Interpretation of Slope b • The slope predicts the increase in Y per unit X. • Example: ŷ = 68. 7 + 0. 42∙X • The slope = 0. 42 Each unit increase in X (GDP) is associated with a 0. 420 increase in Y (life expectancy) 5/19/2021 Chapter 5 9

Interpretation: Intercept a • The intercept is where the line would pass through the

Interpretation: Intercept a • The intercept is where the line would pass through the Y-axis (when X = 0). • Example: ŷ = 68. 7 + 0. 42∙X • The intercept = 68. 7. • We do NOT normally interpolate the intercept 5/19/2021 Chapter 5 10

Regression Line for Prediction • Use regression equation to predict Y given X •

Regression Line for Prediction • Use regression equation to predict Y given X • Example ŷ = 68. 7 + (0. 420)X • What is the predicted life expectancy in a country with a GDP of 20. 0? ŷ = a + b. X = 68. 7+(0. 420)(20. 0) = 77. 12 5/19/2021 Chapter 5 11

Coefficient of Determination Denoted r 2 (the square r) Interpretation: fraction of the Y

Coefficient of Determination Denoted r 2 (the square r) Interpretation: fraction of the Y “explained” by X Illustration: Our example showed r =. 809. Therefore, r 2 =. 8092 = 0. 66. Interpretation: 66% of the variation in Y (life expectancy) is mathematically “explained” by X (GDP) 5/19/2021 Chapter 5 12

Cautions about regression 1. Linear relationships only (see prior lecture) 2. Influenced by outliers

Cautions about regression 1. Linear relationships only (see prior lecture) 2. Influenced by outliers 3. Cannot be extrapolated 4. Association is not equal to causation! (Beware of lurking variables. ) 5/19/2021 Chapter 5 13

Outliers and Influential Points • An outlier is an observation that lies far from

Outliers and Influential Points • An outlier is an observation that lies far from the regression line • Outliers in the Y direction have large residuals • Outliers in the X direction are influential 5/19/2021 Chapter 5 14

Example: Influential Outlier Gesell Adaptive Score and “First Word” After removing child 18 Line

Example: Influential Outlier Gesell Adaptive Score and “First Word” After removing child 18 Line for all data 5/19/2021 Chapter 5 15

Extrapolation • Extrapolation is the use of the regression equation for predictions outside the

Extrapolation • Extrapolation is the use of the regression equation for predictions outside the range of explanatory variable X • Do NOT extrapolate! • See next slide 5/19/2021 Chapter 5 16

Example: extrapolation (Sarah’s height) • Figure: Sarah’s height from age 36 to 60 months

Example: extrapolation (Sarah’s height) • Figure: Sarah’s height from age 36 to 60 months (3 to 5 years) • Regression model: ŷ = 72 +. 4(X) • To predict Sarah’s height at 42 months: ŷ = 72 +. 4(42) = 88. 8 cm ≈ 35” (~ 3’) 5/19/2021 Chapter 5 17

Example: Extrapolation • Do NOT use the regression model to predict Sarah’s height at

Example: Extrapolation • Do NOT use the regression model to predict Sarah’s height at age 360 months (30 years)! • ŷ = 72 +. 4(X) = 72 +. 4(360) = 216 cm = more than 7’ tall (clearly ridiculous) 5/19/2021 Chapter 5 18

Association does not imply causation Even strong correlations may be non-causal See pp. 144

Association does not imply causation Even strong correlations may be non-causal See pp. 144 – 145 for examples! 5/19/2021 Chapter 5 19

Association does not imply causation Criteria to establish causation (pp. 144 – 146): •

Association does not imply causation Criteria to establish causation (pp. 144 – 146): • Strength of relationship • Experimentation • Consistency • Dose-response • Temporality • Plausibility 5/19/2021 Chapter 5 20