Regression Line Definition The leastsquares regression line of

  • Slides: 9
Download presentation
Regression Line Definition: The least-squares regression line of y on x is the line

Regression Line Definition: The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible. Least-Squares Regression Different regression lines produce different residuals. The regression line we want is the one that minimizes the sum of the squared residuals. + n Least-Squares

Regression Line Definition: Equation of the least-squares regression line We have data on an

Regression Line Definition: Equation of the least-squares regression line We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and standard deviations of the two variables and their correlation. The least squares regression line is the line ŷ = a + bx with slope and y intercept Least-Squares Regression We can use technology to find the equation of the leastsquares regression line. We can also write it in terms of the means and standard deviations of the two variables and their correlation. + n Least-Squares

Regression Line . Definition: Equation of the least-squares regression line and y intercept The

Regression Line . Definition: Equation of the least-squares regression line and y intercept The number of miles (in thousands) for the 11 used Hondas have a mean of 50. 5 and a standard deviation of 19. 3. The asking prices had a mean of $14, 425 and a standard deviation of $1, 899. The correlation for these variables is r = -0. 874. Find the equation of the least-squares regression line and explain what change in price we would expect for each additional 19. 3 thousand miles. To find the slope: To find the intercept: 14425 = a - 86(50. 5) a = 18, 768 For each additional 19. 3 thousand miles, we expect the cost to change by So, the CR-V will cost about $1660 fewer dollars for each additional 19. 3 thousand miles. Note: Due to roundoff error, values for the slope and y intercept are slightly different from the values on page 166. Least-Squares Regression slope + n Least-Squares

n Least-Squares Regression Line + . Body Weight (lb): 120 187 109 103 131

n Least-Squares Regression Line + . Body Weight (lb): 120 187 109 103 131 165 158 116 Backpack Weight (lb): 26 30 26 24 29 35 31 You should get 28 Least-Squares Regression Find the least-squares regression equation for the following data using your calculator.

Plots Definition: A residual plot is a scatterplot of the residuals against the explanatory

Plots Definition: A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data. Least-Squares Regression One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between two variables. We see departures from this pattern by looking at the residuals. + n Residual

Residual Plots Pattern in residuals Linear model not appropriate Definition: If we use a

Residual Plots Pattern in residuals Linear model not appropriate Definition: If we use a least-squares regression line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by Least-Squares Regression A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. 1) The residual plot should show no obvious patterns 2) The residuals should be relatively small in size. + n Interpreting

Plots For the used Hondas data, the standard deviation is s= = $972. So,

Plots For the used Hondas data, the standard deviation is s= = $972. So, when we use number of miles to estimate the asking price, we will be off by an average of $972. Least-Squares Regression Here are the scatterplot and the residual plot for the used Hondas data: + n Residual

Role of r 2 in Regression Definition: The coefficient of determination r 2 is

Role of r 2 in Regression Definition: The coefficient of determination r 2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. We can calculate r 2 using the following formula: where and Least-Squares Regression The standard deviation of the residuals gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the leastsquares regression line predicts values of the response y. + n The

Role of r 2 in Regression SSE/SST = 30. 97/83. 87 SSE/SST = 0.

Role of r 2 in Regression SSE/SST = 30. 97/83. 87 SSE/SST = 0. 368 If we use the mean backpack Therefore, 36. 8% of the variation weight as our prediction, the sum in pack is unaccounted for by of theweight squared residuals is 83. 87. the least-squares regression line. SST = 83. 87 Least-Squares Regression r 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. Consider the example on page 179. If we needed to predict a backpack weight for a new hiker, but didn’t know each hikers weight, we could use the average backpack weight as our prediction. + n The 1 – SSE/SST = 1 – 30. 97/83. 87 r 2 = 0. 632 If we use the LSRL to make our 63. 2 % of the variation in backpack weight predictions, the sum of the is accounted for by the linear model squared residuals is 30. 90. relating pack weight to body weight. SSE = 30. 90