Multiple Regression EPP 245 Statistical Analysis of Laboratory
Multiple Regression EPP 245 Statistical Analysis of Laboratory Data 1
Cystic Fibrosis Data Cystic fibrosis lung function data for cystic fibrosis patients (7 -23 years old) age sex height weight bmp fev 1 rv frc tlc pemax October 25, 2007 a numeric vector. Age in years. a numeric vector code. 0: male, 1: female. a numeric vector. Height (cm). a numeric vector. Weight (kg). a numeric vector. Body mass (% of normal). a numeric vector. Forced expiratory volume. a numeric vector. Residual volume. a numeric vector. Functional residual capacity. a numeric vector. Total lung capacity. a numeric vector. Maximum expiratory pressure. EPP 245 Statistical Analysis of Laboratory Data 2
Some Stata Commands. insheet using "cystfibr. csv" (11 vars, 25 obs). graph matrix age sex height weight bmp fev 1 rv frc tlc pemax. graph export cystfibr-scm. wmf. regress pemax age sex height weight bmp fev 1 rv frc tlc. rvfplot. graph export cystfibr-rvf. wmf October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 3
October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 4
Source | SS df MS -------+---------------Model | 17101. 3907 9 1900. 15452 Residual | 9731. 24928 15 648. 749952 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 9, 15) Prob > F R-squared Adj R-squared Root MSE = = = 25 2. 93 0. 0320 0. 6373 0. 4197 25. 471 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -2. 54196 4. 801699 -0. 53 0. 604 -12. 77654 7. 692618 sex | -3. 736782 15. 45982 -0. 24 0. 812 -36. 68861 29. 21505 height | -. 4462549. 9033548 -0. 49 0. 628 -2. 37171 1. 4792 weight | 2. 992816 2. 007957 1. 49 0. 157 -1. 287044 7. 272675 bmp | -1. 744944 1. 155237 -1. 51 0. 152 -4. 207274. 7173865 fev 1 | 1. 080697 1. 080947 1. 00 0. 333 -1. 223288 3. 384682 rv |. 196972. 1962136 1. 00 0. 331 -. 2212474. 6151915 frc | -. 3084314. 4923899 -0. 63 0. 540 -1. 357936. 7410729 tlc |. 1886017. 4997351 0. 38 0. 711 -. 8765585 1. 253762 _cons | 176. 0582 225. 8911 0. 78 0. 448 -305. 4174 657. 5338 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 5
Source | SS df MS -------+---------------Model | 17101. 3907 9 1900. 15452 Residual | 9731. 24928 15 648. 749952 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 9, 15) Prob > F R-squared Adj R-squared Root MSE = = = 25 2. 93 0. 0320 0. 6373 0. 4197 25. 471 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -2. 54196 4. 801699 -0. 53 0. 604 -12. 77654 7. 692618 sex | -3. 736782 15. 45982 -0. 24 0. 812 -36. 68861 29. 21505 height | -. 4462549. 9033548 -0. 49 0. 628 -2. 37171 1. 4792 weight | 2. 992816 2. 007957 1. 49 0. 157 -1. 287044 7. 272675 bmp | -1. 744944 1. 155237 -1. 51 0. 152 -4. 207274. 7173865 fev 1 | 1. 080697 1. 080947 1. 00 0. 333 -1. 223288 3. 384682 rv |. 196972. 1962136 1. 00 0. 331 -. 2212474. 6151915 frc | -. 3084314. 4923899 -0. 63 0. 540 -1. 357936. 7410729 tlc |. 1886017. 4997351 0. 38 0. 711 -. 8765585 1. 253762 _cons | 176. 0582 225. 8911 0. 78 0. 448 -305. 4174 657. 5338 ---------------------------------------T-test of additional value of variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 6
Source | SS df MS -------+---------------Model | 17101. 3907 9 1900. 15452 Residual | 9731. 24928 15 648. 749952 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 9, 15) Prob > F R-squared Adj R-squared Root MSE = = = 25 2. 93 0. 0320 0. 6373 0. 4197 25. 471 Test of whole model ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -2. 54196 4. 801699 -0. 53 0. 604 -12. 77654 7. 692618 sex | -3. 736782 15. 45982 -0. 24 0. 812 -36. 68861 29. 21505 height | -. 4462549. 9033548 -0. 49 0. 628 -2. 37171 1. 4792 weight | 2. 992816 2. 007957 1. 49 0. 157 -1. 287044 7. 272675 bmp | -1. 744944 1. 155237 -1. 51 0. 152 -4. 207274. 7173865 fev 1 | 1. 080697 1. 080947 1. 00 0. 333 -1. 223288 3. 384682 rv |. 196972. 1962136 1. 00 0. 331 -. 2212474. 6151915 frc | -. 3084314. 4923899 -0. 63 0. 540 -1. 357936. 7410729 tlc |. 1886017. 4997351 0. 38 0. 711 -. 8765585 1. 253762 _cons | 176. 0582 225. 8911 0. 78 0. 448 -305. 4174 657. 5338 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 7
October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 8
Source | SS df MS -------+---------------Model | 17101. 3907 9 1900. 15452 Residual | 9731. 24928 15 648. 749952 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 9, 15) Prob > F R-squared Adj R-squared Root MSE = = = 25 2. 93 0. 0320 0. 6373 0. 4197 25. 471 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -2. 54196 4. 801699 -0. 53 0. 604 -12. 77654 7. 692618 sex | -3. 736782 15. 45982 -0. 24 0. 812 -36. 68861 29. 21505 height | -. 4462549. 9033548 -0. 49 0. 628 -2. 37171 1. 4792 weight | 2. 992816 2. 007957 1. 49 0. 157 -1. 287044 7. 272675 bmp | -1. 744944 1. 155237 -1. 51 0. 152 -4. 207274. 7173865 fev 1 | 1. 080697 1. 080947 1. 00 0. 333 -1. 223288 3. 384682 rv |. 196972. 1962136 1. 00 0. 331 -. 2212474. 6151915 frc | -. 3084314. 4923899 -0. 63 0. 540 -1. 357936. 7410729 tlc |. 1886017. 4997351 0. 38 0. 711 -. 8765585 1. 253762 _cons | 176. 0582 225. 8911 0. 78 0. 448 -305. 4174 657. 5338 --------------------------------------- Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 9
. regress pemax age height weight bmp fev 1 rv frc tlc Source | SS df MS -------+---------------Model | 17063. 4886 8 2132. 93607 Residual | 9769. 15144 16 610. 571965 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 8, 16) Prob > F R-squared Adj R-squared Root MSE = = = 25 3. 49 0. 0159 0. 6359 0. 4539 24. 71 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -2. 114515 4. 330841 -0. 49 0. 632 -11. 29549 7. 066459 height | -. 394836. 851725 -0. 46 0. 649 -2. 200412 1. 41074 weight | 2. 834909 1. 841995 1. 54 0. 143 -1. 069947 6. 739765 bmp | -1. 741637 1. 120651 -1. 55 0. 140 -4. 117312. 634038 fev 1 | 1. 26509. 7429407 1. 70 0. 108 -. 3098737 2. 840054 rv |. 1779046. 1742911 1. 02 0. 323 -. 1915759. 5473852 frc | -. 2483218. 4122804 -0. 60 0. 555 -1. 122317. 6256736 tlc |. 2084044. 4782484 0. 44 0. 669 -. 8054369 1. 222246 _cons | 153. 0385 198. 7149 0. 77 0. 452 -268. 2183 574. 2953 --------------------------------------- Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 10
. regress pemax age height weight bmp fev 1 rv frc Source | SS df MS -------+---------------Model | 16947. 5458 7 2421. 07798 Residual | 9885. 09416 17 581. 476127 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 7, 17) Prob > F R-squared Adj R-squared Root MSE = = = 25 4. 16 0. 0077 0. 6316 0. 4799 24. 114 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -2. 663193 4. 043832 -0. 66 0. 519 -11. 19493 5. 868546 height | -. 4895733. 8036502 -0. 61 0. 550 -2. 185127 1. 205981 weight | 3. 155659 1. 647815 1. 92 0. 072 -. 3209274 6. 632245 bmp | -1. 962543. 9753332 -2. 01 0. 060 -4. 020316. 0952305 fev 1 | 1. 247861. 7239953 1. 72 0. 103 -. 2796361 2. 775357 rv |. 1595988. 1650733 0. 97 0. 347 -. 1886753. 5078729 frc | -. 1764595. 368749 -0. 48 0. 638 -. 9544518. 6015328 _cons | 198. 2942 165. 3311 1. 20 0. 247 -150. 5238 547. 1123 --------------------------------------- Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 11
. regress pemax age height weight bmp fev 1 rv Source | SS df MS -------+---------------Model | 16814. 3899 6 2802. 39832 Residual | 10018. 2501 18 556. 569447 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 6, 18) Prob > F R-squared Adj R-squared Root MSE = = = 25 5. 04 0. 0034 0. 6266 0. 5022 23. 592 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------age | -1. 819342 3. 560301 -0. 51 0. 616 -9. 299258 5. 660573 height | -. 4101508. 7693006 -0. 53 0. 600 -2. 026391 1. 20609 weight | 2. 874434 1. 506126 1. 91 0. 072 -. 2898203 6. 038688 bmp | -1. 949083. 9538193 -2. 04 0. 056 -3. 952983. 0548169 fev 1 | 1. 411959. 6238279 2. 26 0. 036. 1013452 2. 722573 rv |. 0955779. 0946057 1. 01 0. 326 -. 1031813. 2943371 _cons | 166. 9049 148. 4762 1. 12 0. 276 -145. 0321 478. 8418 --------------------------------------- Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 12
. regress pemax height weight bmp fev 1 rv Source | SS df MS -------+---------------Model | 16669. 0534 5 3333. 81068 Residual | 10163. 5866 19 534. 92561 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 5, 19) Prob > F R-squared Adj R-squared Root MSE = = = 25 6. 23 0. 0014 0. 6212 0. 5215 23. 128 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------height | -. 4485274. 7505918 -0. 60 0. 557 -2. 019534 1. 122479 weight | 2. 338692 1. 060094 2. 21 0. 040. 1198889 4. 557495 bmp | -1. 641001. 7246036 -2. 26 0. 035 -3. 157614 -. 1243885 fev 1 | 1. 471767. 6007182 2. 45 0. 024. 2144491 2. 729084 rv |. 110117. 0884543 1. 24 0. 228 -. 07502. 295254 _cons | 137. 0958 133. 8559 1. 02 0. 319 -143. 0677 417. 2594 --------------------------------------- Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 13
. regress pemax weight bmp fev 1 rv Source | SS df MS -------+---------------Model | 16478. 0401 4 4119. 51002 Residual | 10354. 5999 20 517. 729996 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 4, 20) Prob > F R-squared Adj R-squared Root MSE = = = 25 7. 96 0. 0005 0. 6141 0. 5369 22. 754 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------weight | 1. 748914. 3806332 4. 59 0. 000. 9549274 2. 542901 bmp | -1. 377243. 5653421 -2. 44 0. 024 -2. 556526 -. 1979604 fev 1 | 1. 547698. 5776112 2. 68 0. 014. 3428223 2. 752574 rv |. 1257152. 0831456 1. 51 0. 146 -. 0477234. 2991538 _cons | 63. 9467 53. 27673 1. 20 0. 244 -47. 18661 175. 08 --------------------------------------- Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 14
. regress pemax weight bmp fev 1 Source | SS df MS -------+---------------Model | 15294. 4519 3 5098. 15064 Residual | 11538. 1881 21 549. 437528 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 3, 21) Prob > F R-squared Adj R-squared Root MSE = = = 25 9. 28 0. 0004 0. 5700 0. 5086 23. 44 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------weight | 1. 536475. 3644235 4. 22 0. 000. 7786149 2. 294335 bmp | -1. 465406. 5792906 -2. 53 0. 019 -2. 670106 -. 260705 fev 1 | 1. 108629. 5143694 2. 16 0. 043. 0389396 2. 178319 _cons | 126. 3336 34. 71986 3. 64 0. 002 54. 12965 198. 5375 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 15
. stepwise, pr(. 05): regress pemax age sex height weight bmp fev 1 rv frc tlc begin with full model p = 0. 8123 >= 0. 0500 removing sex p = 0. 6688 >= 0. 0500 removing tlc p = 0. 6384 >= 0. 0500 removing frc p = 0. 6156 >= 0. 0500 removing age p = 0. 5572 >= 0. 0500 removing height p = 0. 1462 >= 0. 0500 removing rv Source | SS df MS -------+---------------Model | 15294. 4519 3 5098. 15064 Residual | 11538. 1881 21 549. 437528 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 3, 21) Prob > F R-squared Adj R-squared Root MSE = = = 25 9. 28 0. 0004 0. 5700 0. 5086 23. 44 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------fev 1 | 1. 108629. 5143694 2. 16 0. 043. 0389396 2. 178319 weight | 1. 536475. 3644235 4. 22 0. 000. 7786149 2. 294335 bmp | -1. 465406. 5792906 -2. 53 0. 019 -2. 670106 -. 260705 _cons | 126. 3336 34. 71986 3. 64 0. 002 54. 12965 198. 5375 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 16
. stepwise, pr(. 1) pe(. 05): regress pemax age sex height weight bmp fev 1 rv frc tlc begin with full model p = 0. 8123 >= 0. 1000 removing sex p = 0. 6688 >= 0. 1000 removing tlc p = 0. 6384 >= 0. 1000 removing frc p = 0. 6156 >= 0. 1000 removing age p = 0. 5572 >= 0. 1000 removing height p = 0. 1462 >= 0. 1000 removing rv Source | SS df MS -------+---------------Model | 15294. 4519 3 5098. 15064 Residual | 11538. 1881 21 549. 437528 -------+---------------Total | 26832. 64 24 1118. 02667 Number of obs F( 3, 21) Prob > F R-squared Adj R-squared Root MSE = = = 25 9. 28 0. 0004 0. 5700 0. 5086 23. 44 ---------------------------------------pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------fev 1 | 1. 108629. 5143694 2. 16 0. 043. 0389396 2. 178319 weight | 1. 536475. 3644235 4. 22 0. 000. 7786149 2. 294335 bmp | -1. 465406. 5792906 -2. 53 0. 019 -2. 670106 -. 260705 _cons | 126. 3336 34. 71986 3. 64 0. 002 54. 12965 198. 5375 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 17
Cautionary Notes • The significance levels are not necessarily believable after variable selection • The original full model F-statistic is significant, indicating that there is some significant relationship: F(9, 15) = 2. 93, p = 0. 0320 • After variable selection, F(3, 21) = 9. 28, p = 0. 0004, which is biased. October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 18
set obs 25 generate x 1 = invnormal(uniform()) generate x 2 = invnormal(uniform()) generate x 3 = invnormal(uniform()) generate x 4 = invnormal(uniform()) generate x 5 = invnormal(uniform()) generate x 6 = invnormal(uniform()) generate x 7 = invnormal(uniform()) generate x 8 = invnormal(uniform()) generate x 9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 stepwise, pr(. 1): regress y x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 19
. regress y x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Source | SS df MS -------+---------------Model | 12. 3235639 9 1. 36928488 Residual | 22. 5105993 15 1. 50070662 -------+---------------Total | 34. 8341632 24 1. 45142347 Number of obs F( 9, 15) Prob > F R-squared Adj R-squared Root MSE = 25 = 0. 91 = 0. 5397 = 0. 3538 = -0. 0340 = 1. 225 ---------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------x 1 | -. 0441858. 2998066 -0. 15 0. 885 -. 6832085. 594837 x 2 | -. 9078136. 4347798 -2. 09 0. 054 -1. 834525. 0188976 x 3 |. 2076754. 3789522 0. 55 0. 592 -. 6000421 1. 015393 x 4 | -. 0056383. 3319125 -0. 02 0. 987 -. 7130931. 7018166 x 5 | -. 330546. 3854497 -0. 86 0. 405 -1. 152113. 4910207 x 6 |. 0202964. 3470704 0. 06 0. 954 -. 7194666. 7600594 x 7 | -. 073401. 3135234 -0. 23 0. 818 -. 7416603. 5948583 x 8 | -. 0552909. 3026913 -0. 18 0. 858 -. 7004621. 5898803 x 9 | -. 3190092. 3137931 -1. 02 0. 325 -. 9878434. 349825 _cons | -. 2490392. 3078424 -0. 81 0. 431 -. 9051898. 4071113 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 20
. stepwise, pr(. 1): regress y x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 begin with full model p = 0. 9867 >= 0. 1000 removing x 4 p = 0. 9545 >= 0. 1000 removing x 6 p = 0. 8456 >= 0. 1000 removing x 1 p = 0. 8165 >= 0. 1000 removing x 7 p = 0. 7506 >= 0. 1000 removing x 8 p = 0. 5023 >= 0. 1000 removing x 3 p = 0. 2866 >= 0. 1000 removing x 5 p = 0. 2081 >= 0. 1000 removing x 9 Source | SS df MS -------+---------------Model | 8. 33379862 1 8. 33379862 Residual | 26. 5003646 23 1. 15218977 -------+---------------Total | 34. 8341632 24 1. 45142347 Number of obs F( 1, 23) Prob > F R-squared Adj R-squared Root MSE = = = 25 7. 23 0. 0131 0. 2392 0. 2062 1. 0734 ---------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+--------------------------------x 2 | -. 6644002. 2470417 -2. 69 0. 013 -1. 175445 -. 1533555 _cons | -. 1523124. 214703 -0. 71 0. 485 -. 5964594. 2918346 --------------------------------------- October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data 21
- Slides: 21