LSRLs Interpreting r vs r 2 r the









- Slides: 9

LSRLs: Interpreting r vs. r 2 • r – “the correlation coefficient” tells you the strength and direction between two variables (x and y, for example, height v. weight, r = ? ). -1 ≤ r ≤ 1 • r will not work for non-linear relationships • r does not have units (r ≠. 30 pounds? ) • r is not resistant to outliers! Consider the effect of outliers when looking at r, report r with outliers and without • r is same regardless of which is explanatory and which is response variable

Understanding what is expected with LSRLs Note: When finding LSRL the placement of the explanatory and response variables DOES matter! Y_hat = _x + _ (prediction equation, equation of line of best fit) Found by minimizing sum of squares of residuals *extra credit for manual calculation from packet 1. Find LSRL using calculator: stat->calc->8 or 4 (linreg) resulted in the output for packet examples, y_hat = a + bx (#8), y_hat = ax + b (#4) 2. Find LSRL using computer output. 3. Find LSRL using b= r sy/sx. (You are not given data, you are given statistics: sy, sx, x_bar, y_bar, and r. ) Find b. , Substitute into y_bar = ax_bar + b. Solve for a. Substitue a and b into y_hat = a + bx and you are done.

Simple understanding: r v r 2 r, correlation coefficient r 2, coefficient of (strength and direction, only about relationship between x and y, r is related to the slope of LSRL – b = rsy/sx) determination (how strong=accurate is our LSRL? ) 1. 80. 50. 20 0 -1 1. 64. 25. 04 0 1

Examining LSRLs: r v. r 2 • Students height v. weight y_hat = 4. 915 x -157. 613 predicted weight = 4. 915(height) -157. 613 r = r 2 =

To answer the question in your packet, which is the better prediction equation (which would be more accurate in making a prediction)? • The one with the highest r 2 value! • The higher the value, the more % of variation in y is explained by the LSRL of y on x.

Theory behind r 2 • It tells us how much better a line with a slope would be at predicting than a line of y=y_bar. • It compares the vertical deviations (residuals) between the sloped line and the horizontal line (y=y_bar) and tells how much better the sloped line is in accounting for this variation. • This math and theory can be found in the book • You don’t have to know theory for AP Test or my test.

What You Should Know: Summary of r 2 • r 2 tells us how accurate our LSRL is at making predictions. • Do you think the x value in each observation tells you something about y? How much is it actually telling you? • When r 2 = 1 we say “ 100% of the variation in weight is explained by the LSRL. • When r 2 =. 64, we say “ 64% of the variation in y is explained by the LSRL. • r 2 tells us the fractional variation in y that is explained by the LSRL of y on x. • MUST USE THIS SPECIFIC LANGUAGE TO INTERPRET r 2 ON THE AP TEST AND MY TEST!!!

What is a residual? • The vertical deviation from y to y_hat from each observation to the LSRL (y_hat) -> “y-y_hat”. • The residual values (the vertical deviations) are stored in your calculator each time you run a linear regression Lin. Reg a+bx. • These residuals can be found in RESID in your calculator 2 nd->Stat->RESID

What do the residuals tell us? • The residuals tell us whether a line is a best fit (maybe a non-linear function, exponential or power, might fit the data better and help us predict better). • How to create a residual plot: • Plot x, the explanatory variable, L 1 vs. y=RESIDS. (x vs RESIDS) • If the plot shows a pattern (not scattered), then a line is not a best fit.