Math Interlude Signs Symbols and ANOVA nuts and
Math Interlude: Signs, Symbols, and ANOVA nuts and bolts
Remember these? n Σ= Summation = Product ! = Factorial n Just shorthand!! n n
Example n n n Take 5 data points: 5, 6, 7, 9, 10 Represent them generally as X 1 to X 5 (the observations are indexed by subscripts i): Just a shorthand way to write “add up all five data points”!
Example n n Take 5 data points: 5, 6, 7, 9, 10 Represent them generally as X 1 to X 5 (the observations are indexed by subscripts i):
More summation… n In general,
Working with summation… n In general,
Practice (just for fun!)…
Products… n 5 data points again: 5, 6, 7, 9, 10 = 5(6)(7)(9)(10) = 18, 900
Factorials… 5! = 5(4)(3)(2)(1) note: 0! = 1 by convention
Review of math symbols… n X = often used to indicate the independent (predictor) variable n Y = often used to indicate the dependent (or outcome) variable n Xi = the ith observation of X n Xij = in a table, the observation in row i and column j n µy = the “true” (population or theoretical) mean of Y n Y = the sample mean of Y n x 2 = the true variance of X / x = the true standard deviation n Sx 2= Sample variance/ Sx = Sample standard dev
ANOVA: Concepts Review n n n When do you use ANOVA? What assumptions are you making when you use ANOVA? What is the null hypothesis in ANOVA? What does a “significant” ANOVA mean? Why not just do a bunch of t-tests to compare means?
n ANOVA example Does this example meet the assumptions of ANOVA? Mean micronutrient intake from the school lunch by school Calcium (mg) Iron (mg) Folate (μg) Zinc (mg) Mean SDe Mean SD S 1 a, n=28 117. 8 62. 4 2. 0 0. 6 26. 6 13. 1 1. 9 1. 0 S 2 b, n=25 158. 7 70. 5 2. 0 0. 6 38. 7 14. 5 1. 2 S 3 c, n=21 206. 5 86. 2 2. 0 0. 6 42. 6 15. 1 1. 3 0. 4 School 1 (most deprived; 40% subsidized lunches). b School 2 (medium deprived; <10% subsidized). c School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0. 05). a P-valued 0. 000 0. 854 0. 000 0. 055 FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England-are the nutritional standards being met? Appetite. 2006 Jan; 46(1): 86 -92.
ANOVA: Concepts Review n On a piece of paper, represent the relationship between school group and calcium as a linear regression equation (no you do not need a computer to calculate this!) n n n predictor=school group outcome=calcium no other predictors in the model set the most deprived school as your reference group round numbers for ease!
New Stuff: The Math Behind ANOVA n It’s like this: If I have three groups to compare: n n n I could do three pair-wise ttests, but this would increase my type I error So, instead I want to look at the pairwise differences “all at once. ” To do this, I can recognize that variance is a statistic that let’s me look at more than one difference at a time…
The ANOVA “F-test” Is the difference in the means of the groups more than background noise (=variability within groups)? Summarizes the mean differences between all groups at once. Analogous to pooled variance from a ttest.
Side Note: The F-distribution n The F-distribution is a continuous probability distribution that depends on two parameters n and m (numerator and denominator degrees of freedom, respectively): http: //www. econtools. com/jevons/java/Graphics 2 D/FDist. html
The F-distribution n A ratio of variances follows an F-distribution: l. The F-tests the hypothesis that two variances are equal. l. F will be close to 1 if sample variances are equal.
For example… n n Randomize 33 subjects to three groups: 800 mg calcium supplement vs. 1500 mg calcium supplement vs. placebo. Compare the spine bone density of all 3 groups after 1 year.
Group means and standard deviations n Placebo group (n=11): n n n 800 mg calcium supplement group (n=11) n n n Mean spine BMD =. 92 g/cm 2 standard deviation =. 10 g/cm 2 Mean spine BMD =. 94 g/cm 2 standard deviation =. 08 g/cm 2 1500 mg calcium supplement group (n=11) n n Mean spine BMD =1. 06 g/cm 2 standard deviation =. 11 g/cm 2
Spine bone density vs. treatment 1. 2 1. 1 1. 0 S P I N E 0. 9 Within group variability Between group variation Within group variability 0. 8 0. 7 PLACEBO 800 mg CALCIUM 1500 mg CALCIUM
Between-group variation. The size of the groups. The F-Test The average amount of variation within groups. The difference of each group’s mean from the overall mean. Large F value indicates Each group’s variance. that the between group variation exceeds the within group variation (=the background noise).
How to calculate ANOVA’s by hand… Treatment 1 Treatment 2 Treatment 3 Treatment 4 y 11 y 21 y 31 y 41 y 12 y 22 y 32 y 42 y 13 y 23 y 33 y 43 y 14 y 24 y 34 y 44 y 15 y 25 y 35 y 45 y 16 y 26 y 36 y 46 y 17 y 27 y 37 y 47 y 18 y 28 y 38 y 48 y 19 y 29 y 39 y 49 y 110 y 210 y 310 y 410 n=10 obs. /group k=4 groups The group means The (within) group variances
Sum of Squares Within (SSW), or Sum of Squares Error (SSE) The (within) group variances + + + Sum of Squares Within (SSW) (or SSE, for chance error) Terminology Note: n “Sum of squares” is just a fancy way of saying the numerator of a variance. n Sum of squares divided by degrees of freedom = variance n Variance times degrees of freedom = sum of squares
Sum of Squares Between (SSB), or Sum of Squares Regression (SSR) Overall mean of all 40 observations (“grand mean”) Sum of Squares Between (SSB). Variability of the group means compared to the grand mean (the variability due to the treatment).
Total Sum of Squares (SST) Total sum of squares(TSS). Squared difference of every observation from the overall mean. (numerator of variance of Y!)
Partitioning of Variance + = SSW + SSB = TSS
ANOVA Table Source of variation d. f. Sum of squares Between (k groups) k-1 SSB Within nk-k F-statistic nk-1 SSW (sum of squared deviations of observations from their group mean) p-value Go to SSB/k-1 (sum of squared deviations of group means from grand mean) (n individuals per group) Total variation Mean Sum of Squares Fk-1, nk-k chart s 2=SSW/nk-k TSS (sum of squared deviations of observations from grand mean) TSS=SSB + SSW
ANOVA=t-test Source of variation Between (2 groups) Within d. f. 1 2 n-2 Sum of squares SSB (squared difference in means multiplied by n) SSW equivalent to numerator of pooled variance Total 2 n-1 variation TSS Mean Sum of Squares Squared difference in means times n Pooled variance F-statistic p-value Go to F 1, 2 n-2 Chart notice values are just (t 2 n 2 2)
Numerical Example Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 56 68 62 59 61 65 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65
Example Step 1) calculate the sum of squares between groups: Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 Mean for group 2 = 59. 7 56 68 62 59 61 65 Mean for group 3 = 56. 3 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65 Mean for group 1 = 62. 0 Mean for group 4 = 61. 4 Grand mean= 59. 85 SSB = [(62 -59. 85)2 + (59. 7 -59. 85)2 + (56. 3 -59. 85)2 + (61. 4 -59. 85)2 ] xn per group= 19. 65 x 10 = 196. 5
Example Step 2) calculate the sum of squares within groups: (60 -62) 2+(67 -62) 2+ (42 -62) 2+ (67 -62) 2+ (56 -62) 2+ (6262) 2+ (64 -62) 2+ (59 -62) 2+ (72 -62) 2+ (71 -62) 2+ (5059. 7) 2+ (52 -59. 7) 2+ (4359. 7) 2+67 -59. 7) 2+ (6759. 7) 2+ (69 -59. 7) 2…+…. (sum of 40 squared deviations) = 2060. 6 Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 56 68 62 59 61 65 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65
Step 3) Fill in the ANOVA table Source of variation d. f. Sum of squares Mean Sum of Squares F-statistic p-value Between 3 196. 5 65. 5 1. 14 . 344 Within 36 2060. 6 57. 2 Total 39 2257. 1
Step 3) Fill in the ANOVA table Source of variation d. f. Sum of squares Mean Sum of Squares F-statistic p-value Between 3 196. 5 65. 5 1. 14 . 344 Within 36 2060. 6 57. 2 Total 39 2257. 1 Coefficient of Determination: How much of the variance in height is “explained by” treatment group? R 2=“Coefficient of Determination” = SSB/TSS = 196. 5/2275. 1=9%
Coefficient of Determination The amount of variation in the outcome variable (dependent variable) that is “explained by” the predictor (independent variable).
Beyond one-way ANOVA Often, you may want to test more than 1 treatment. ANOVA can accommodate more than 1 treatment or factor, so long as they are independent. Again, the variation partitions beautifully! TSS = SSB 1 + SSB 2 + SSW
Calculating ANOVA from grouped data… Table 6. Mean micronutrient intake from the school lunch by school Calcium (mg) Iron (mg) Folate (μg) Zinc (mg) Mean SDe Mean SD S 1 a, n=25 117. 8 62. 4 2. 0 0. 6 26. 6 13. 1 1. 9 1. 0 S 2 b, n=25 158. 7 70. 5 2. 0 0. 6 38. 7 14. 5 1. 2 S 3 c, n=25 206. 5 86. 2 2. 0 0. 6 42. 6 15. 1 1. 3 0. 4 School 1 (most deprived; 40% subsidized lunches). b School 2 (medium deprived; <10% subsidized). c School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0. 05). a P-valued 0. 000 0. 854 0. 000 0. 055 FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England-are the nutritional standards being met? Appetite. 2006 Jan; 46(1): 86 -92.
Answer Step 1) calculate the sum of squares between groups: Mean for School 1 = 117. 8 Mean for School 2 = 158. 7 Mean for School 3 = 206. 5 Grand mean: 161 SSB = [(117. 8 -161)2 + (158. 7 -161)2 + (206. 5 -161)2] x 25 per group= 98, 113
Answer Step 2) calculate the sum of squares within groups: S. D. for S 1 = 62. 4 S. D. for S 2 = 70. 5 S. D. for S 3 = 86. 2 Therefore, sum of squares within is: (24)[ 62. 42 + 70. 5 2+ 86. 22]=391, 066
Answer Step 3) Fill in your ANOVA table Source of variation d. f. Sum of squares Mean Sum of Squares F-statistic p-value Between 2 98, 113 49056 9 <. 05 Within 72 391, 066 5431 Total 74 489, 179 **R 2=98113/489179=20% School “explains” 20% of the variance in lunchtime calcium intake in these kids.
ANOVA reminders… n n A statistically significant ANOVA (F-test) only tells you that at least two of the groups differ, but not which ones differ. Determining which groups differ (when it’s unclear) requires more sophisticated analyses to correct for the problem of multiple comparisons…
ANOVA reminders… n n A statistically significant ANOVA does not imply a clinically significant difference between the groups! For example, if we had found a significant 10 -mg difference in calcium, this probably would not be enough to care…
ANOVA reminders… n When data are not normally distributed and sample size is small, use nonparametric ANOVA equivalent…
ANOVA reminders n ANOVA is just linear regression!
- Slides: 43