Correlation Bivariate distribution a distribution that shows the
Correlation Bivariate distribution: a distribution that shows the relation between two variables Left hemisphere Right hemisphere 1 0. 9 This graph is called a scatter plot or scatter diagram Visual Acuity 0. 8 0. 7 0. 6 0. 5 0. 4 -2 -1. 9 -1. 8 -1. 7 -1. 6 -1. 5 Area of primary visual cortex -1. 4 -1. 3
How do we quantify the strength of the relationship between the two variables in a bivariate distribution?
How do we quantify the strength of the relationship between the two variables in a bivariate distribution?
Example: Two measures made for each subject – stress level and eating difficulties E. D. 17 9 8 13 8 7 20 18 14 11 7 1 21 5 22 15 19 26 30 28 25 Eating Difficulties Stress 20 15 10 5 5 10 15 20 Stress 25 30 35
The most common way to quantify the relation between the two variables in a bivariate distribution is the Pearson correlation coefficient, labeled r. r is always between -1 and 1. The z-score formula is the most intuitive formula: Example: use the z-score formula to calculate r: raw scores X Y z scores zx zy 17 9 0. 06 -0. 52 -0. 03 8 13 -1. 23 -0. 04 8 7 16. 60 -1. 23 -0. 76 0. 93 20 18 7. 02 0. 48 0. 57 0. 27 14 11 13. 30 -0. 37 -0. 28 0. 10 7 1 8. 28 -1. 37 -1. 48 2. 03 21 5 0. 63 -1. 00 -0. 63 22 15 0. 77 0. 21 0. 16 19 26 0. 34 1. 53 0. 52 30 28 1. 91 1. 77 3. 39 mx = sx = my = sy = 6. 68
How does each data point contribute to the correlation value? y zx zy 17 9 0. 06 -0. 52 -0. 03 8 13 -1. 23 -0. 04 8 7 -1. 23 -0. 76 0. 93 20 18 0. 48 0. 57 0. 27 14 11 -0. 37 -0. 28 0. 10 7 1 -1. 37 -1. 48 2. 03 21 5 0. 63 -1. 00 -0. 63 22 15 0. 77 0. 21 0. 16 19 26 0. 34 1. 53 0. 52 30 28 1. 91 1. 77 3. 39 r = 0. 68 25 Eating Difficulties x mx 20 15 my 10 5 5 10 15 20 Stress 25 30 Points in the upper right or lower left quadrants add to the correlation value Points in the upper left or lower right subtract to the correlation value. 35 30
Fun fact about the Pearson correlation statistic Since the z-scores do not change when you add or multiply the raw scores, the Pearson correlation doesn’t change either. multiplying y by 2 and adding 100
Similarly, the correlation stays the same no matter how you stretch your axes: r = 0. 68 30 r = 0. 68 Eating Difficulties 25 As a rule, you should plot your axes with an equal scale. 20 25 15 10 20 10 30 r = 0. 68 30 Eating Difficulties 20 Stress Eating Difficulties 5 15 10 20 5 10 15 20 Stress 25 30 0 0 20 Stress 40
Guess that correlation!
Guess that correlation!
Guess that correlation!
Guess that correlation!
Guess that correlation!
Guess that correlation!
Guess that correlation!
Guess that correlation!
Guess that correlation! r = -0. 56 140 130 y 120 110 100 90 80 70 0 20 40 60 x 80 100
Guess that correlation! r = 0. 94 150 145 140 135 y 130 125 120 115 110 105 10 20 30 40 x 50 60
Guess that correlation! r = 0. 08 160 150 y 140 130 120 110 10 20 30 40 50 x 60 70 80 90
Guess that correlation! r = -1. 00 155 150 y 145 140 135 -20 -15 -10 -5 x 0 5
Guess that correlation! r = -0. 08 140 130 120 y 110 100 90 80 -40 -30 -20 -10 0 x 10 20 30 40
Guess that correlation! r = 0. 49 240 220 200 y 180 160 140 120 100 80 -50 0 50 x 100
Guess that correlation! r = -0. 92 70 60 50 y 40 30 20 10 0 -20 -10 0 10 20 30 x 40 50 60 70
Guess that correlation! r = -0. 77 220 210 200 190 y 180 170 160 150 140 130 -40 -20 0 20 x 40 60
r is a measure of the linear relation between two variables r = 0. 01 4 3. 5 3 y 2. 5 2 1. 5 1 0. 5 0 -2 -1 0 x 1 2
Guess that correlation! r = 0. 00 1 y 0. 5 0 -0. 5 -1 -1. 5 -1 -0. 5 0 x 0. 5 1 1. 5
Guess that correlation! r = 0. 91 1 0. 8 0. 6 0. 4 y 0. 2 0 -0. 2 -0. 4 -0. 6 -0. 8 -1 -1 -0. 5 0 x 0. 5 1
Z-Score formula for calculating r (intuitive, but not very practical) Substituting the formula for z: Deviation-Score formula for calculating r: (somewhat intuitive, somewhat more practical) Computational formula for calculating r: (less intuitive, more practical)
Computational formula for calculating r: (less intuitive, more practical) A little algebra shows that: Computational raw score formula for calculating r: (least intuitive, most practical)
Using the Computational raw-score formula: n 10 X 17 8 8 20 14 7 21 22 19 30 Totals 166 SSX SSy r 492. 4 662. 4 0. 675 Y 9 13 7 18 11 2 5 15 26 28 X 2 289 64 64 400 196 49 441 484 361 900 Y 2 81 169 49 324 121 4 25 225 676 784 XY 153 104 56 360 154 14 105 330 494 840 134 3248 2458 2610
A second measure of correlation, called the Spearman Rank-Order Coefficient is appropriate for ordinal scores. It is calculated by: Where D is the difference between each pair of ranks. Most often used when: a) At least one variable is an ordinal scale b) One of the distributions is very skewed or has outliers
Example: Is there a correlation between your preference for Otter Pops® flavors and mine? Fact: (According to Wikipedia anyway) In 1995, National Pax had planned to replace the "Sir Isaac Lime" flavor with "Scarlett O'Cherry, " until a group of Orange County, California fourth-graders created a petition in opposition and picketed the company's headquarters in early 1996. The crusade also included an e-mail campaign, in which a Stanford professor reportedly accused the company of "Otter-cide. " After meeting with the children, company executives relented and retained the Sir Isaac Lime flavor. [1]
Example: Suppose two wine experts were asked to rank-order their preference for eight wines. How can we measure the similarity of their rankings? X 1 2 3 4 5 6 7 8 n=8 Y 2 1 5 3 4 7 8 6 Rank X Rank Y 1 2 2 1 3 5 4 6 7 7 8 8 6 D -1 1 -2 1 1 -1 -1 2 D 2 1 1 4 14
Pearson correlation is much more sensitive to outlying values than the Spearman coefficient. From: http: //en. wikipedia. org/wiki/Spearman%27 s_rank_correlation_coefficient
Pearson correlation is more sensitive to outlying values than the Spearman coefficient.
Only the rank order matters for the Spearman coefficient Pearson r: 0. 92 Spearman rs: 1. 00 1 Y 0. 5 0 -0. 5 0 X 0. 5
- Slides: 41