Chapter 7 Scatterplots Association and Correlation Stats modeling

  • Slides: 16
Download presentation
Chapter 7 Scatterplots, Association, and Correlation Stats: modeling the world Second edition Raymond Dahlman

Chapter 7 Scatterplots, Association, and Correlation Stats: modeling the world Second edition Raymond Dahlman IV

Scatterplots Scatter plots are an effective way of displaying data, and lets us to

Scatterplots Scatter plots are an effective way of displaying data, and lets us to visibly see if there is an association between 2 quantitative sets of variables. When looking at a scatter plot, we look for the direction, form, strength, and any unusual features like outliers in the data

Features of Scatterplots Direction The direction of data determines whether the variables have a

Features of Scatterplots Direction The direction of data determines whether the variables have a positive or negative correlation

Features of Scatterplots Form The form, or the overall shape of the data, tells

Features of Scatterplots Form The form, or the overall shape of the data, tells us how the two data sets are related, either linearly or by a non-linear function.

Features of Scatterplots Strength The strength of the data is the relationship of each

Features of Scatterplots Strength The strength of the data is the relationship of each data point to the overall expected value, the line of best fit.

Determining Variables When presented with 2 sets of data, it is necessary to assign

Determining Variables When presented with 2 sets of data, it is necessary to assign each variable to its respective category. The data set that you believe that determines the other is the independent, or explanatory variable. The other variable, is the dependent, or response variable. When graphing, the explanatory variable is the x-axis, while the response variable is the y-axis.

Correlation

Correlation

Correlation Conditions Because the correlation coefficient is only useful for linear trends, the data

Correlation Conditions Because the correlation coefficient is only useful for linear trends, the data must follow these conditions. The Quantitative Variables Condition: The 2 variables must be quantitative data in the appropriate units The Straight Enough Condition: The data must have a linear trend. If it does not, there are methods to straighten the data. The Outlier Condition: Be aware of outliers that could potentially distort the correlation coeficant.

Correlation Properties

Correlation Properties

Straightening Scatterplots When a scatterplot shows a bent from that consistently increases or decreases,

Straightening Scatterplots When a scatterplot shows a bent from that consistently increases or decreases, we can often straighten the form of the plot by re-expressing one or both variables, enabling us to apply the correlation coefficient.

Problem #33 People who responded to a July 2004 Discovery Channel poll named the

Problem #33 People who responded to a July 2004 Discovery Channel poll named the 10 best roller coasters in the United States. The table below shows the length of the initial drop (in feet) and the duration of the ride (in seconds). What do these data indicate about the heght of a roller coaster and the length of the ride you can expect?

Make a Scatterplot Roller Coasters 4000 3500 3000 2500 2000 1500 1000 500 0

Make a Scatterplot Roller Coasters 4000 3500 3000 2500 2000 1500 1000 500 0 0 1 2 3 4 5 6 7 8 9 10

Drop (ft) Duration (s) 171 133 74. 75 51. 33 -0. 88299 1. 123809

Drop (ft) Duration (s) 171 133 74. 75 51. 33 -0. 88299 1. 123809 0. 419048 0. 321088 -0. 40136 0. 575283 -1. 01678 -0. 84286 -1. 13719 0. 038964 -0. 545490 0. 915644 2. 084551 -0. 253263 -1. 324761 0. 136372 -0. 837717 0. 526008 -0. 837717

Problem #35 Is there any pattern to the locations of the planets in our

Problem #35 Is there any pattern to the locations of the planets in our solar system? The table shows the average distance of each of the nine planets from the sun. a) Make a scatterplot and describe the association. (Remember: direction, form, and strength!) b) Why would you not want to talk about the correlation between planet position and distance from the sun? c) Make a scatterplot showing the logarithm of distance vs. position. What is better about this scatterplot?

Planetary distances from the Sun 4000 3500 3000 2500 2000 1500 1000 500 0

Planetary distances from the Sun 4000 3500 3000 2500 2000 1500 1000 500 0 0 2 4 6 8 10 A: The relation between the position and distance is non-linear with a positive association. There is very little scatter in the data. B: The relation is not linear.

4 Planetary distances from the Sun 3. 5 3 2. 5 2 1. 5

4 Planetary distances from the Sun 3. 5 3 2. 5 2 1. 5 1 0. 5 0 0 2 4 6 8 10 C: The relation between the position and the log of the distance appears to be roughly linear.