Correlation Simple correlation between two variables Multiple and
Correlation • Simple correlation – between two variables • Multiple and Partial correlations – between one variable and a set of other variables • Canonical Correlation – between two sets of variables each containing more than one variable. • Simple and multiple correlations are special cases of canonical correlation. Xuhua Xia Partial: between X and Y with Z being controlled for Multiple: x 1 on x 2 and x 3 Slide 1
Review of correlation X 1 1 1 2 2 2 3 3 4 4 4 5 5 6 6 Z 4 5 6 3 4 5 6 5 3 4 5 6 3 4 5 2 3 4 5 1 2 3 4 Xuhua Xia Y 14. 0000 17. 9087 16. 3255 14. 4441 15. 2952 19. 1587 16. 0299 17. 0000 14. 7556 17. 6823 20. 5301 21. 6408 15. 0903 18. 1603 22. 2471 14. 4450 16. 5554 21. 0047 22. 0000 19. 0000 18. 1863 21. 0000 Compute Pearson correlation coefficients between X and Z, X and Y and Z and Y. Compute partial correlation coefficient between X and Y, controlling for Z (i. e. , the correlation coefficient between X and Y when Z is held constant), by using the equation in the previous slide. Run R to verify your calculation: install. packages("ggm") library(ggm) md<-read. table("XYZ. txt", header=T) cor(md) s<-var(md) parcor(s) install. packages("psych") library(psych) smc(s) Slide 2
Data for canonical correlation # First three variables: physical # Last three variables: exercise # Middle-aged men weight waist pulse chins 191 36 50 5 193 38 58 12 189 35 46 13 211 38 56 8 176 31 74 15 169 34 50 17 154 34 64 14 193 36 46 6 176 37 54 4 156 33 54 15 189 37 52 2 162 35 62 12 182 36 56 4 167 34 60 6 154 30 56 17 166 33 52 13 247 46 50 1 202 37 62 12 157 32 52 11 138 33 68 2 Xuhua Xia situps 162 101 145 151 200 120 215 170 160 215 130 145 141 155 251 210 50 120 230 150 jumps 60 101 58 38 40 38 105 31 25 73 60 37 42 40 250 115 50 120 80 43 Slide 3
Many Possible Correlations • With multiple DV’s (say A, B, C) and IV’s (say a, b, c, d, e), there could be many correlation patterns: – Variable A in the DV set could be correlated to variables a, b, c in the IV set – Variable B in the DV set could be correlated to variables c, d in the IV set – Variable C in the DV set could be correlated to variables a, c, e in the IV set • With these plethora of possible correlated relationships, what is the best way of summarizing them? Xuhua Xia Slide 4
Dealing with Two Sets of Variables • The simple correlation approach: – For N DV’s and M IV’s, calculate the simple correlation coefficient between each of N DV’s and each of M IV’s, yielding a total of N*M correlation coefficients • The multiple correlation approach: – For N DV’s and M IV’s, calculate multiple or partial correlation coefficients between each of N DV’s and the set of M IV’s, yielding a total of N correlation coefficients • The canonical correlation • Note: All these deal with linear correlations Xuhua Xia Slide 5
Correlation matrix md<-read. table("Cancor. txt", header=T) attach(md) R<-cor(md) R weight 1. 00000 waist 0. 86958 pulse -0. 36576 chins -0. 38969 situps -0. 70557 jumps -0. 22630 Xuhua Xia waist pulse chins situps jumps 0. 86958 -0. 365762 -0. 38969 -0. 70557 -0. 226296 1. 00000 -0. 333131 -0. 58893 -0. 83610 -0. 344578 -0. 33313 1. 000000 0. 15065 0. 15723 0. 034933 -0. 58893 0. 150648 1. 00000 0. 50058 0. 495760 -0. 83610 0. 157234 0. 50058 1. 00000 0. 461611 -0. 34458 0. 034933 0. 49576 0. 46161 1. 000000 Slide 6
Multiple correlations fit<-lm(weight~chins+situps+jumps); summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 237. 81551 14. 97422 15. 882 3. 23 e-11 *** chins -0. 49462 0. 99778 -0. 496 0. 62683 situps -0. 37270 0. 10717 -3. 478 0. 00311 ** jumps 0. 07798 0. 10038 0. 777 0. 44861 Multiple R-squared: 0. 5178, Adjusted R-squared: 0. 4274 fit<-lm(waist~chins+situps+jumps); summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 44. 904132 1. 470390 30. 539 1. 3 e-15 *** chins -0. 178739 0. 097977 -1. 824 0. 086838. situps -0. 053678 0. 010524 -5. 101 0. 000107 *** jumps 0. 009669 0. 009857 0. 981 0. 341223 Multiple R-squared: 0. 7527, Adjusted R-squared: 0. 7063 fit<-lm(pulse~chins+situps+jumps); summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 52. 09212 6. 17858 8. 431 2. 79 e-07 *** chins 0. 17474 0. 41170 0. 424 0. 677 situps 0. 02021 0. 04422 0. 457 0. 654 jumps -0. 01279 0. 04142 -0. 309 0. 762 Multiple R-squared: 0. 03736, Adjusted R-squared: -0. 1431 Slide 7
Multiple correlation fit<-lm(chins~weight+waist+pulse); summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 47. 179551 16. 226537 2. 908 0. 0103 * weight 0. 106933 0. 084510 1. 265 0. 2239 waist -1. 602230 0. 608407 -2. 633 0. 0181 * pulse -0. 006223 0. 151557 -0. 041 0. 9678 Multiple R-squared: 0. 4084, Adjusted R-squared: 0. 2974 fit<-lm(situps~weight+waist+pulse); summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 656. 41462 102. 44989 6. 407 8. 68 e-06 *** weight 0. 09125 0. 53357 0. 171 0. 86636 waist -13. 10675 3. 84132 -3. 412 0. 00357 ** pulse -0. 88500 0. 95689 -0. 925 0. 36877 Multiple R-squared: 0. 7161, Adjusted R-squared: 0. 6629 fit<-lm(jumps~weight+waist+pulse); summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 318. 4270 189. 2686 1. 682 0. 112 weight 0. 5820 0. 9857 0. 590 0. 563 waist -9. 2426 7. 0966 -1. 302 0. 211 pulse -0. 4683 1. 7678 -0. 265 0. 794 Multiple R-squared: 0. 1445, Adjusted R-squared: -0. 01585 Slide 8
Canonical correlation (cc) install. packages("ggplot 2") install. packages("Ggally") install. packages("CCA") install. packages("CCP") require(ggplot 2) require(GGally) require(CCA) require(CCP) phys<-md[, 1: 3] exer<-md[, 4: 6] matcor(phys, exer) cc 1<-cc(phys, exer) cc 1 http: //www. ats. ucla. edu/stat/r/dae/canonical. htm
cc output [1] 0. 87857805 0. 26499182 0. 06266112 canonical correlations $xcoef [, 1] [, 2] [, 3] weight 0. 007691932 0. 08206036 -0. 01089895 waist -0. 352367502 -0. 46672576 0. 12741976 pulse -0. 016888712 0. 04500996 0. 14113157 raw canonical coefficients matrices: U and V phys*U: raw canonical variates for phys $ycoef [, 1] [, 2] [, 3] chins 0. 063996632 0. 19132168 0. 116137756 situps 0. 017876736 -0. 01743903 0. 001201433 jumps -0. 002949483 0. 00494516 -0. 022700322 exer*V: raw canonical variates for exer $scores$xscores: standardized canonical variates.
standardized canonical variates $scores$xscores [, 1] [1, ] -0. 06587452 [2, ] -0. 89033536 [3, ] 0. 33866397 [4, ] -0. 71810315 [5, ] 1. 17525491 [6, ] 0. 46963797 [7, ] 0. 11781701 [8, ] 0. 01706419 [9, ] -0. 60117586 [10, ] 0. 65445550 [11, ] -0. 46740331 [12, ] -0. 13923760 [13, ] -0. 23643419 [14, ] 0. 28536698 [15, ] 1. 66239672 [16, ] 0. 76515225 [17, ] -3. 15880133 [18, ] -0. 53629531 [19, ] 1. 04829236 [20, ] 0. 27955875 [, 2] 0. 39294336 -0. 01630780 0. 51550858 1. 37075870 2. 57590579 -0. 47893295 -1. 07969890 0. 37702424 -1. 12464792 -0. 89895199 -0. 14788320 -0. 97996173 -0. 07554011 -0. 19295410 0. 42712450 -0. 16836835 0. 32106568 1. 36900100 -0. 44018579 -1. 74589901 [, 3] -0. 90048466 0. 46160952 -1. 57063280 -0. 01683463 2. 01305832 -0. 91554740 1. 22377873 -1. 48680882 -0. 04505445 -0. 33675460 -0. 46900387 0. 98174380 0. 04439525 0. 51756617 -0. 41495287 -0. 72800719 -0. 23662794 0. 80062552 -0. 75733645 1. 83526836 $scores$yscores [, 1] [1, ] -0. 23742244 [2, ] -1. 00085572 [3, ] -0. 02345494 [4, ] -0. 17718803 [5, ] 1. 14084951 [6, ] -0. 15539717 [7, ] 1. 15328755 [8, ] 0. 05512308 [9, ] -0. 23394065 [10, ] 1. 31166763 [11, ] -1. 00146790 [12, ] -0. 02551244 [13, ] -0. 62373985 [14, ] -0. 23957331 [15, ] 1. 56116497 [16, ] 0. 97041241 [17, ] -2. 46610861 [18, ] -0. 71723790 [19, ] 1. 30318577 [20, ] -0. 59379197 [, 2] -0. 91888370 1. 68690015 0. 89826285 -0. 26188291 0. 23274696 2. 00062200 0. 10127530 -1. 01048386 -1. 24840794 0. 13435186 -0. 93479995 0. 60309281 -0. 83299874 -0. 70439205 0. 76448365 0. 04660035 0. 21954878 1. 44951672 -0. 85790412 -1. 36764817 [, 3] -0. 28185833 -0. 47289464 0. 67222000 0. 55274626 1. 37918010 1. 56074166 -0. 19445711 0. 50220023 0. 39411232 0. 64809096 -0. 66871744 1. 03278901 -0. 01462037 0. 27987584 -3. 09433899 -0. 54360525 -0. 65396658 -0. 88137354 0. 04265917 -0. 25878331
Canonical structure: Correlations $scores$corr. X. xscores [, 1] [, 2] [, 3] weight -0. 8028458 0. 53345479 -0. 2662041 waist -0. 9871691 0. 07372001 -0. 1416419 pulse 0. 2061478 0. 10981908 0. 9723389 correlation between phys variables with CVs_U $scores$corr. Y. xscores [, 1] [, 2] [, 3] chins 0. 6101751 0. 18985890 0. 004125743 situps 0. 8442193 -0. 05748754 -0. 010784582 jumps 0. 3638095 0. 09727830 -0. 052192182 correlation between exer variables with CVs_U $scores$corr. X. yscores [, 1] [, 2] [, 3] weight -0. 7053627 0. 14136116 -0. 016680651 waist -0. 8673051 0. 01953520 -0. 008875444 pulse 0. 1811170 0. 02910116 0. 060927845 correlation between phys variables with CVs_V $scores$corr. Y. yscores [, 1] [, 2] [, 3] chins 0. 6945030 0. 7164708 0. 06584216 situps 0. 9608928 -0. 2169408 -0. 17210961 jumps 0. 4140890 0. 3670993 -0. 83292764 correlation between exer variables with CVs_V
Significance: p. asym in CCP v. Cancor<-cc 1$cor # p. asym(rho, N, p, q, tstat = "Wilks|Hotelling|Pillai|Roy") p. asym(v. Cancor, length(md$weight), 3, 3, tstat = "Wilks") Wilks' Lambda, using F-approximation (Rao's F): stat approx df 1 df 2 p. value At least one cancor significant? 1 to 3: 0. 2112505 3. 4003788 9 34. 22293 0. 004421278 Significant relationship after excluding 2 to 3: 0. 9261286 0. 2933756 4 30. 00000 0. 879945478 cancor 1? 3 to 3: 0. 9960736 0. 0630703 1 16. 00000 0. 804904236 Significant relationship after excluding plt. asym(res, rhostart=1) plt. asym(res, rhostart=2) plt. asym(res, rhostart=3) cancor 1 and 2?
Ecology data: Assignment # 24 sites; for each site, record coverage of four species and concentration of four chemicals 21. 09 21. 90 9. 19 9. 18 20. 96 21. 52 7. 46 7. 41 14. 69 14. 85 14. 06 14. 07 14. 80 14. 63 13. 71 13. 69 2. 11 2. 17 3. 13 3. 06 3. 17 2. 43 2. 10 1. 96 9. 58 9. 47 8. 14 8. 06 9. 54 9. 71 9. 36 9. 43 10. 02 10. 71 9. 02 9. 06 11. 16 10. 59 10. 91 11. 10 14. 65 14. 32 15. 10 15. 15 14. 59 14. 61 13. 55 24. 42 24. 12 6. 00 6. 12 24. 36 24. 50 4. 34 22. 20 22. 10 4. 14 4. 04 23. 37 22. 74 4. 90 5. 06 8. 34 8. 88 9. 16 9. 06 8. 75 8. 19 7. 58 10. 49 10. 12 11. 08 11. 13 10. 09 10. 73 9. 55 9. 56 25. 72 25. 91 1. 12 1. 16 25. 94 26. 01 1. 98 1. 99 4. 16 4. 44 3. 05 3. 09 3. 97 4. 89 4. 53 12. 07 12. 31 11. 09 11. 15 12. 68 12. 89 12. 62 12. 78 19. 13 19. 36 11. 13 11. 05 18. 69 19. 05 9. 01 9. 16 5. 80 5. 15 4. 11 4. 18 6. 07 6. 33 5. 10 4. 96 1. 27 1. 15 2. 10 2. 17 1. 27 1. 80 0. 73 0. 75 22. 15 22. 52 8. 01 8. 04 22. 08 22. 53 7. 43 7. 31 26. 53 26. 27 0. 14 0. 11 26. 33 26. 88 0. 55 0. 57 17. 25 17. 68 11. 12 11. 18 17. 39 17. 76 9. 51 9. 55 7. 94 7. 46 6. 13 6. 03 7. 53 7. 67 7. 51 7. 47 4. 12 4. 45 3. 08 3. 14 5. 21 4. 65 3. 92 4. 00 17. 59 17. 53 11. 19 11. 04 16. 97 16. 70 12. 30 12. 26 15. 41 15. 16 13. 12 13. 03 15. 79 16. 01 12. 00 11. 83 12. 90 12. 93 11. 12 12. 80 12. 04 11. 52 19. 14 19. 11 7. 16 7. 14 19. 88 19. 84 8. 86 8. 90 25. 11 25. 50 3. 13 3. 20 25. 28 25. 44 4. 26 4. 23 Slide 14
- Slides: 14