Analysis of Variance ANOVA SPH 247 Statistical Analysis

  • Slides: 36
Download presentation
Analysis of Variance (ANOVA) SPH 247 Statistical Analysis of Laboratory Data April 2, 2013

Analysis of Variance (ANOVA) SPH 247 Statistical Analysis of Laboratory Data April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 1

ANOVA—Fixed and Random Effects �We will review the analysis of variance (ANOVA) and then

ANOVA—Fixed and Random Effects �We will review the analysis of variance (ANOVA) and then move to random and fixed effects models �Nested models are used to look at levels of variability (days within subjects, replicate measurements within days) �Crossed models are often used when there are both fixed and random effects. April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 2

The Basic Idea �The analysis of variance is a way of testing whether observed

The Basic Idea �The analysis of variance is a way of testing whether observed differences between groups are too large to be explained by chance variation �One-way ANOVA is used when there are k ≥ 2 groups for one factor, and no other quantitative variable or classification factor. April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 3

A April 2, 2013 B C 9 10 12 7 9 14 7 8

A April 2, 2013 B C 9 10 12 7 9 14 7 8 14 9 9 12 SPH 247 Statistical Analysis of Laboratory Data 4

Data = Grand Mean + Column Deviations from grand mean + Cell Deviations from

Data = Grand Mean + Column Deviations from grand mean + Cell Deviations from column mean Are the column deviations from the grand mean too big to be accounted for by the cell deviations from the column means? April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 5

Data A April 2, 2013 B C 9 10 12 7 9 14 7

Data A April 2, 2013 B C 9 10 12 7 9 14 7 8 14 9 9 12 SPH 247 Statistical Analysis of Laboratory Data 6

Column Means A April 2, 2013 B C 8 9 13 SPH 247 Statistical

Column Means A April 2, 2013 B C 8 9 13 SPH 247 Statistical Analysis of Laboratory Data 7

Deviations from Column Means A April 2, 2013 B C 1 1 -1 -1

Deviations from Column Means A April 2, 2013 B C 1 1 -1 -1 0 1 -1 -1 1 1 0 -1 SPH 247 Statistical Analysis of Laboratory Data 8

red. cell. folate package: ISw. R R Documentation Red cell folate data Description: The

red. cell. folate package: ISw. R R Documentation Red cell folate data Description: The 'folate' data frame has 22 rows and 2 columns. It contains data on red cell folate levels in patients receiving three different methods of ventilation during anesthesia. Format: This data frame contains the following columns: folate: a numeric vector. Folate concentration (μg/l). ventilation: a factor with levels: 'N 2 O+O 2, 24 h': 50% nitrous oxide and 50% oxygen, continuously for 24~hours; 'N 2 O+O 2, op': 50% nitrous oxide and 50% oxygen, only during operation; 'O 2, 24 h': no nitrous oxide, but 35 -50% oxygen for 24~hours. April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 9

> data(red. cell. folate) > help(red. cell. folate) > summary(red. cell. folate) folate ventilation

> data(red. cell. folate) > help(red. cell. folate) > summary(red. cell. folate) folate ventilation Min. : 206. 0 N 2 O+O 2, 24 h: 8 1 st Qu. : 249. 5 N 2 O+O 2, op : 9 Median : 274. 0 O 2, 24 h : 5 Mean : 283. 2 3 rd Qu. : 305. 5 Max. : 392. 0 > attach(red. cell. folate) > plot(folate ~ ventilation) April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 10

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 11

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 11

> folate. lm <- lm(folate ~ ventilation) > summary(folate. lm) Call: lm(formula = folate

> folate. lm <- lm(folate ~ ventilation) > summary(folate. lm) Call: lm(formula = folate ~ ventilation) Residuals: Min 1 Q -73. 625 -35. 361 Median -4. 444 3 Q 35. 625 Max 75. 375 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 316. 62 16. 16 19. 588 4. 65 e-14 *** ventilation. N 2 O+O 2, op -60. 18 22. 22 -2. 709 0. 0139 * ventilation. O 2, 24 h -38. 62 26. 06 -1. 482 0. 1548 --Signif. codes: 0 `***' 0. 001 `**' 0. 01 `*' 0. 05 `. ' 0. 1 ` ' 1 Residual standard error: 45. 72 on 19 degrees of freedom Multiple R-Squared: 0. 2809, Adjusted R-squared: 0. 2052 F-statistic: 3. 711 on 2 and 19 DF, p-value: 0. 04359 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 12

> anova(folate. lm) Analysis of Variance Table Response: folate Df Sum Sq Mean Sq

> anova(folate. lm) Analysis of Variance Table Response: folate Df Sum Sq Mean Sq F value Pr(>F) ventilation 2 15516 7758 3. 7113 0. 04359 * Residuals 19 39716 2090 --Signif. codes: 0 `***' 0. 001 `**' 0. 01 `*' 0. 05 `. ' 0. 1 ` ' 1 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 13

Two- and Multi-way ANOVA �If there is more than one factor, the sum of

Two- and Multi-way ANOVA �If there is more than one factor, the sum of squares can be decomposed according to each factor, and possibly according to interactions �One can also have factors and quantitative variables in the same model (cf. analysis of covariance) �All have similar interpretations April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 14

Heart rates after enalaprilat (ACE inhibitor) Description: 36 rows and 3 columns. data for

Heart rates after enalaprilat (ACE inhibitor) Description: 36 rows and 3 columns. data for nine patients with congestive heart failure before and shortly after administration of enalaprilat, in a balanced two-way layout. Format: hr a numeric vector. Heart rate in beats per minute. subj a factor with levels '1' to '9'. time a factor with levels '0' (before), '30', (minutes after administration). April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data '60', and '120' 15

> data(heart. rate) > attach(heart. rate) > heart. rate hr subj time 1 96

> data(heart. rate) > attach(heart. rate) > heart. rate hr subj time 1 96 1 0 2 110 2 0 3 89 3 0 4 95 4 0 5 128 5 0 6 100 6 0 7 72 7 0 8 79 8 0 9 100 9 0 10 92 1 30. . . 18 106 9 30 19 86 1 60. . . 27 104 9 60 28 92 1 120. . . 36 102 9 120 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 16

> plot(hr~subj) > plot(hr~time) > hr. lm <- lm(hr~subj+time) > anova(hr. lm) Analysis of

> plot(hr~subj) > plot(hr~time) > hr. lm <- lm(hr~subj+time) > anova(hr. lm) Analysis of Variance Table Note that when the design is orthogonal, the ANOVA results don’t depend on the order of terms. Response: hr Df Sum Sq Mean Sq F value Pr(>F) subj 8 8966. 6 1120. 8 90. 6391 4. 863 e-16 *** time 3 151. 0 50. 3 4. 0696 0. 01802 * Residuals 24 296. 8 12. 4 --Signif. codes: 0 `***' 0. 001 `**' 0. 01 `*' 0. 05 `. ' 0. 1 ` ' 1 > sres <- resid(lm(hr~subj)) > plot(sres~time) April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 17

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 18

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 18

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 19

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 19

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 20

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 20

Fixed and Random Effects �A fixed effect is a factor that can be duplicated

Fixed and Random Effects �A fixed effect is a factor that can be duplicated (dosage of a drug) �A random effect is one that cannot be duplicated �Patient/subject �Repeated measurement �There can be important differences in the analysis of data with random effects �The error term is always a random effect April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 21

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 22

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 22

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 23

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 23

Estradiol data from Rosner � 5 subjects from the Nurses’ Health Study �One blood

Estradiol data from Rosner � 5 subjects from the Nurses’ Health Study �One blood sample each �Each sample assayed twice for estradiol (and three other hormones) �The within variability is strictly technical/assay �Variability within a person over time will be much greater April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 24

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 25

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 25

> anova(lm(Estradiol ~ Subject, data=endocrin)) Analysis of Variance Table Response: Estradiol Df Sum Sq

> anova(lm(Estradiol ~ Subject, data=endocrin)) Analysis of Variance Table Response: Estradiol Df Sum Sq Mean Sq F value Pr(>F) Subject 4 593. 31 148. 329 24. 546 0. 001747 ** Residuals 5 30. 21 6. 043 --Signif. codes: 0 ‘***’ 0. 001 ‘**’ 0. 01 ‘*’ 0. 05 ‘. ’ 0. 1 ‘ ’ 1 Replication error variance is 6. 043, so the standard deviation of replicates is 2. 46 pg/m. L This compared to average levels across subjects from 8. 05 to 18. 80 Estimated variance across subjects is (148. 329 − 6. 043)/2 = 71. 143 Standard deviation across subjects is 8. 43 pg/m. L If we average the replicates, we get five values, the standard deviation of which is also 71. 1 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 26

Fasting Blood Glucose �Part of a larger study that also examined glucose tolerance during

Fasting Blood Glucose �Part of a larger study that also examined glucose tolerance during pregnancy �Here we have 53 subjects with 6 tests each at intervals of at least a year �The response is glucose as mg/100 m. L April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 27

> anova(lm(FG ~ Subject, data=fg 2)) Analysis of Variance Table Response: FG Df Sum

> anova(lm(FG ~ Subject, data=fg 2)) Analysis of Variance Table Response: FG Df Sum Sq Mean Sq F value Pr(>F) Subject 52 10936 210. 310 2. 9235 9. 717 e-09 *** Residuals 265 19064 71. 938 --Signif. codes: 0 ‘***’ 0. 001 ‘**’ 0. 01 ‘*’ 0. 05 ‘. ’ 0. 1 ‘ ’ 1 > Estimated within-Subject variance is 71. 938, so the standard deviation is 8. 48 mg/100 m. L Estimated between-Subject variance is (210. 310 − 71. 938)/6 = 23. 062, sd = 4. 80 mg/100 m. L The variance of the 53 means is 35. 05, which is larger because it includes a component of the within-subject variance April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 28

Nested Random Effects Models �Cooperative trial with 6 laboratories, one analyte (7 in the

Nested Random Effects Models �Cooperative trial with 6 laboratories, one analyte (7 in the full data set), 3 batches per lab (a month apart), and 2 replicates per batch �Estimate the variance components due to labs, batches, and replicates �Test for significance if possible �Effects are lab, batch-in-lab, and error April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 29

Analysis using lm or aov > anova(lm(Conc ~ Lab + Lab: Bat, data=coop 2))

Analysis using lm or aov > anova(lm(Conc ~ Lab + Lab: Bat, data=coop 2)) Analysis of Variance Table Response: Conc Df Sum Sq Lab 5 1. 89021 Lab: Bat 12 0. 20440 Residuals 18 0. 11335 Mean Sq F value Pr(>F) 0. 37804 60. 0333 1. 354 e-10 *** 0. 01703 2. 7049 0. 02768 * 0. 00630 The test for batch-in-lab is correct, but the test for lab is not—the denominator should be The Lab: Bat MS, so F(5, 12) = 0. 37804/0. 01703 = 22. 198 so p = 3. 47 e-4, still significant Residual Batch Lab April 2, 2013 0. 00630 0. 00537 0. 01683 0. 0794 0. 0733 0. 2453 SPH 247 Statistical Analysis of Laboratory Data 30

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 31

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 31

Analysis using lme �R package nlme �Two separate formulas, one for the fixed effects

Analysis using lme �R package nlme �Two separate formulas, one for the fixed effects and one for the random effects �In this case, no fixed effects �Nested random effects use the / notation lme(Conc ~1, random = ~1 | Lab/Bat, data=coop 2) April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 32

lme(Conc ~1, random = ~1 | Lab/Bat, data=coop 2) Linear mixed-effects model fit by

lme(Conc ~1, random = ~1 | Lab/Bat, data=coop 2) Linear mixed-effects model fit by REML Data: coop 2 Log-restricted-likelihood: 21. 02158 Fixed: Conc ~ 1 (Intercept) 0. 5080556 Average Concentration Random effects: Formula: ~1 | Lab (Intercept) Std. Dev: 0. 2452922 SD of Labs Formula: ~1 | Bat %in% Lab (Intercept) Residual Std. Dev: 0. 07326702 0. 07935504 SD of Batches and Replicates Number of Observations: 36 Number of Groups: Lab Bat %in% Lab 6 18 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 33

Hypothesis Tests �When data are balanced, one can compute expected mean squares, and many

Hypothesis Tests �When data are balanced, one can compute expected mean squares, and many times can compute a valid F test. �In more complex cases, or when data are unbalanced, this is more difficult �One requirement for certain hypothesis tests to be valid is that the null hypothesis value is not on the edge of the possible values �For H 0: α = 0, we have that α could be either positive or negative �For H 0: σ2 = 0, negative variances are not possible April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 34

Effect Variance SD ------------------Residual 0. 00630 0. 0794 Batch 0. 00537 0. 0733 Lab

Effect Variance SD ------------------Residual 0. 00630 0. 0794 Batch 0. 00537 0. 0733 Lab 0. 01683 0. 2453 �The variance among replicates a month apart (0. 00630 + 0. 00537 = 0. 01167) is about twice that of those on the same day (0. 00630), and the standard deviations are 0. 1080 and 0. 0794. These are CV’s on the average of 21% and 16% respectively �The variance among values from different labs is about 0. 0285, with a standard deviation of 0. 1688 and a CV of about 33% April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 35

More complex models �When data are balanced and the expected mean squares can be

More complex models �When data are balanced and the expected mean squares can be computed, this is a valid way for testing and estimation �Programs like lme and lmer in R and Proc Mixed in SAS can handle complex models �But most likely this is a time when you may need to consult an expert April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 36