Correlation A bit about Pearsons r Questions What

Correlation A bit about Pearson’s r

Questions • What does it mean when a correlation is positive? Negative? • What is the purpose of the Fisher r to z transformation? • What is range restriction? Range enhancement? What do they do to r? • Give an example in which data properly analyzed by ANOVA cannot be used to infer causality. • Why do we care about the sampling distribution of the correlation coefficient? • What is the effect of reliability on r?

Basic Ideas • Nominal vs. continuous IV • Degree (direction) & closeness (magnitude) of linear relations – Sign (+ or -) for direction – Absolute value for magnitude • Pearson product-moment correlation coefficient

Illustrations Positive, negative, zero

Always Plot Your Data!

Simple Formulas Use either N throughout or else use N-1 throughout (SD and denominator); result is the same as long as you are consistent. Pearson’s r is the average cross product of z scores. Product of (standardized) moments from the means.

Graphic Representation 1. Conversion from raw to z. 2. Points & quadrants. Positive & negative products. 3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall. 4. Product at maximum (average =1) when points on line where z. X=z. Y.

Descriptive Statistics Ht Wt Valid N (listwise) N 10 10 10 Minimum 60. 00 110. 00 Maximum 78. 00 200. 00 Mean 69. 0000 155. 0000 Std. Deviation 6. 05530 30. 27650 r = 1. 0

r=1 Leave X, add error to Y. r=. 99

r=. 99 Add more error. r=. 91

With 2 variables, the correlation is the z-score slope.

Review • What does it mean when a correlation is positive? Negative?

Sampling Distribution of r Statistic is r, parameter is ρ (rho). In general, r is slightly biased. The sampling variance is approximately: Sampling variance depends both on N and on ρ.

Fisher’s r to z Transformation r. 10. 20. 30. 40. 50. 60. 70. 80. 90 z. 10. 20. 31. 42. 55. 69. 87 1. 10 1. 47 Sampling distribution of z is normal as N increases. Pulls out short tail to make better (normal) distribution. Sampling variance of z = (1/(n-3)) does not depend on ρ. R to z function is also atanh in geometry.

Hypothesis test: Result is compared to t with (N 2) df for significance. Say r=. 25, N=100 p<. 05 t(. 05, 98) = 1. 984.

Hypothesis test 2: One sample z test where r is sample value and ρ is hypothesized population value. Say N=200, r =. 54, and ρ is. 30. =4. 13 Compare to unit normal, e. g. , 4. 13 > 1. 96 so it is significant. Our sample was not drawn from a population in which rho is. 30.

Hypothesis test 3: Testing equality of correlations from 2 INDEPENDENT samples. Say N 1=150, r 1=. 63, N 2=175, r 2=70. = -1. 18, n. s.

Hypothesis test 4: Testing equality of any number of independent correlations. Compare Q to chi-square with k-1 df. Study r n 1 . 2 200. 2 2 3 sum (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2 39. 94 . 41 . 0441 8. 69 . 5 150. 55 80. 75 . 41 . 0196 2. 88 . 6 75 . 41 . 0784 5. 64 425 z . 69 49. 91 170. 6 17. 21=Q Chi-square at. 05 with 2 df = 5. 99. Not all rho are equal.

Hypothesis test 5: dependent r Hotelling-Williams test Say N=101, r 12=. 4, r 13=. 6, r 23=. 3 t(. 05, 98) = 1. 98 See my notes.

Review • What is the purpose of the Fisher r to z transformation? • Test the hypothesis that – Given that r 1 =. 50, N 1 = 103 – r 2 =. 60, N 2 = 128 and the samples are independent. • Why do we care about the sampling distribution of the correlation coefficient?

Range Restriction

Range enhancement

Reliability sets the ceiling for validity. Measurement error attenuates correlations. If correlation between true scores is. 7 and reliability of X and Y are both. 8, observed correlation is 7. sqrt(. 8*. 8) =. 7*. 8 =. 56. Disattenuated correlation If our observed correlation is. 56 and the reliabilities of both X and Y are. 8, our estimate of the correlation between true scores is. 56/. 8 =. 70.

Add Error to Y only The correlation decreases. Distribution of X does not change. Distribution of Y becomes wider (increased variance). Slope of Y on X remains constant (SDy effect on b and r cancels out. Not true for error in X.

Review • What is range restriction? Range enhancement? What do they do to r? • What is the effect of reliability on r?

SAS Power Estimation proc power; onecorr dist=fisherz corr = 0. 35 nullcorr = 0. 2 sides = 1 ntotal = 100 power =. ; run; Computed Power Actual alpha =. 05 Power =. 486 proc power; onecorr = 0. 35 nullcorr = 0 sides = 2 ntotal =. power =. 8; run; Computed N Total Alpha =. 05 Actual Power =. 801 Ntotal = 61

Power for Correlations Rho N required against Null: rho = 0 . 10 782 . 15 346 . 20 193 . 25 123 . 30 84 . 35 61 Sample sizes required for powerful conventional significance tests for typical values of the correlation coefficient in psychology. Power =. 8, two tails, alpha is. 05.

Programs • Review ‘corrs’ Excel program from website – Download Excel file – Show examples of tests for correlations • Review R program for computing correlations

Exercises • Download Spector’s data • Compute univariates & correlation matrix 5 vbls: – Age, Autonomy, Work hours, Interpersonal conflict, Job Satisfaction • Problems: – Which pairs are significant? (use the per comparison or nominal alpha) – Is the absolute value of the correlation between conflict and job satisfaction significantly different from. 5? – Is the correlation between age and conflict different than the correlation between age and job satisfaction?