A student wonders if tall women tend to

  • Slides: 30
Download presentation
A student wonders if tall women tend to date taller men than do short

A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman date. Draw & discuss the scatterplot and calculate the correlation coefficient. Women Men (x) (y) 66 72 64 68 66 70 65 68 70 71 65 65

Scatterplot, find r & describe. SAT-math SAT-verbal 680 450 440 780 570 550 610

Scatterplot, find r & describe. SAT-math SAT-verbal 680 450 440 780 570 550 610 730 530 700 640 740 500 720 570 600 530 800

l Create scatterplot l Find the correlation l Describe the association Fat (g) Sodium

l Create scatterplot l Find the correlation l Describe the association Fat (g) Sodium 19 920 31 1500 34 1310 35 39 39 43 860 1180 940 1260

Linear Regression

Linear Regression

Guess the correlation coefficient l http: //istics. net/stat/Correlations/

Guess the correlation coefficient l http: //istics. net/stat/Correlations/

Can we make a Line of Best Fit

Can we make a Line of Best Fit

Regression Line l This is a line that describes how a response variable (y)

Regression Line l This is a line that describes how a response variable (y) changes as an explanatory variable (x) changes. l It’s used to predict the value of (y) for a given value of (x). l Unlike correlation, regression requires that we have an explanatory variable.

Let’s try some! l http: //illuminations. nctm. org/Activity. Detail. asp x? ID=146

Let’s try some! l http: //illuminations. nctm. org/Activity. Detail. asp x? ID=146

Regression Line l

Regression Line l

The following data shows the number of miles driven and advertised price for 11

The following data shows the number of miles driven and advertised price for 11 used Honda CR-Vs from the 2002 -2006 model years (prices found at www. carmax. com). The scatterplot below shows a strong, negative linear association between number of miles and advertised cost. The correlation is -0. 874. The line on the plot is the regression line for predicting advertised price based on number of miles. Thousand Miles Driven Cost (dollars) 22 29 35 39 45 49 55 56 69 70 86 17998 16450 14998 13998 14599 14988 13599 14599 11998 14450 10998

The regression line is shown below…. Use it to answer the following. Slope: Y-intercept:

The regression line is shown below…. Use it to answer the following. Slope: Y-intercept:

Predict the price for a Honda with 50, 000 miles.

Predict the price for a Honda with 50, 000 miles.

Extrapolation l This refers to using a regression line for prediction far outside the

Extrapolation l This refers to using a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. l They are not usually very accurate predictions.

 l Slope: l Y-int: l Predict weight after 16 wk l Predict weight

l Slope: l Y-int: l Predict weight after 16 wk l Predict weight at 2 years:

Residual l

Residual l

The equation of the least-squares regression line for the sprint time and longjump distance

The equation of the least-squares regression line for the sprint time and longjump distance data is predicted long-jump distance = 304. 56 – 27. 3 (sprint time). Find and interpret the residual for the student who had a sprint time of 8. 09 seconds.

Regression l Let’s see how a regression line is calculated.

Regression l Let’s see how a regression line is calculated.

Fat vs Calories in Burgers Fat (g) 19 31 34 Calories 410 580 590

Fat vs Calories in Burgers Fat (g) 19 31 34 Calories 410 580 590 35 39 39 43 570 640 680 660

Let’s standardize the variables Fat Cal z - x's z - y's 19 410

Let’s standardize the variables Fat Cal z - x's z - y's 19 410 -1. 959 -2 31 580 -0. 42 -0. 1 34 590 -0. 036 0 35 570 0. 09 -0. 2 39 640 0. 6 0. 56 39 680 0. 6 1 43 660 1. 12 0. 78 The line must contain the point and pass through the origin.

Let’s clarify a little. (Just watch & listen) The equation for a line that

Let’s clarify a little. (Just watch & listen) The equation for a line that passes through the origin can be written with just a slope & no intercept: y = mx. But, we’re using z-scores so our equation should reflect this and thus it’s Many lines with different slope pass through the origin. Which one fits our data the best? That is which slope determines the line that minimizes the sum of the squared residuals.

Line of Best Fit –Least Squares Regression Line It’s the line for which the

Line of Best Fit –Least Squares Regression Line It’s the line for which the sum of the squared residuals is smallest. We want to find the mean squared residual. Residual = Observed - Predicted Focus on the vertical deviations from the line.

Let’s find it. (just watch & soak it in) St. Dev of z scores

Let’s find it. (just watch & soak it in) St. Dev of z scores is 1 so variance is 1 also. This is r!

Continue…… Since this is a parabola – it reaches it’s minimum at This gives

Continue…… Since this is a parabola – it reaches it’s minimum at This gives us Hence – the slope of the best fit line for z-scores is the correlation coefficient → r.

Slope – rise over run A slope of r for z-scores means that for

Slope – rise over run A slope of r for z-scores means that for every increase of 1 standard deviation in , there is an increase of r standard deviations in . “Over 1 and up r” Translate back to x & y values – “over one standard deviation in x, up r standard deviations in y. Slope of the regression line is:

Why is correlation “r” l Because it was calculated from the regression of y

Why is correlation “r” l Because it was calculated from the regression of y on x after standardizing the variables – just like we have just done – thus he used r to stand for (standardized) regression.

The number of miles (in thousands) for the 11 used Hondas have a mean

The number of miles (in thousands) for the 11 used Hondas have a mean of 50. 5 and a standard deviation of 19. 3. The asking prices had a mean of $14, 425 and a standard deviation of $1, 899. The correlation for these variables is r = -0. 874. Find the equation of the least-squares regression line and explain what change in price we would expect for each additional 19. 3 thousand miles.

So let’s write the equation! Slope: Explain the slope: Fat (g) Calories 19 410

So let’s write the equation! Slope: Explain the slope: Fat (g) Calories 19 410 31 580 34 590 35 570 39 640 39 680 43 660

Now for the final part – the equation! Y-intercept: Remember – it has to

Now for the final part – the equation! Y-intercept: Remember – it has to pass through the point . Solve for y-intercept:

Now it can be used to predict. l How many calories do I expect

Now it can be used to predict. l How many calories do I expect to find in a hamburger that has 25 grams of fat?

That’s…all…. . Folks!

That’s…all…. . Folks!