OneWay ANOVA Introduction to Analysis of Variance ANOVA

  • Slides: 30
Download presentation
One-Way ANOVA Introduction to Analysis of Variance (ANOVA)

One-Way ANOVA Introduction to Analysis of Variance (ANOVA)

What is ANOVA? n n n ANOVA is short for ANalysis Of VAriance Used

What is ANOVA? n n n ANOVA is short for ANalysis Of VAriance Used with 3 or more groups to test for MEAN DIFFS. E. g. , caffeine study with 3 groups: q q q n n n No caffeine Mild dose Jolt group Level is value, kind or amount of IV Treatment Group is people who get specific treatment or level of IV Treatment Effect is size of difference in means

Rationale for ANOVA (1) n n We have at least 3 means to test,

Rationale for ANOVA (1) n n We have at least 3 means to test, e. g. , H 0: 1 = 2 = 3. Could take them 2 at a time, but really want to test all 3 (or more) at once. Instead of using a mean difference, we can use the variance of the group means about the grand mean over all groups. Logic is just the same as for the t-test. Compare the observed variance among means (observed difference in means in the t-test) to what we would expect to get by chance.

Rationale for ANOVA (2) Suppose we drew 3 samples from the same population. Our

Rationale for ANOVA (2) Suppose we drew 3 samples from the same population. Our results might look like this: Note that the means from the 3 groups are not exactly the same, but they are close, so the variance among means will be small.

Rationale for ANOVA (3) Suppose we sample people from 3 different populations. Our results

Rationale for ANOVA (3) Suppose we sample people from 3 different populations. Our results might look like this: Note that the sample means are far away from one another, so the variance among means will be large.

Rationale for ANOVA (4) Suppose we complete a study and find the following results

Rationale for ANOVA (4) Suppose we complete a study and find the following results (either graph). How would we know or decide whethere is a real effect or not? To decide, we can compare our observed variance in means to what we would expect to get on the basis of chance given no true difference in means.

Review n n When would we use a t-test versus 1 -way ANOVA? In

Review n n When would we use a t-test versus 1 -way ANOVA? In ANOVA, what happens to the variance in means (between cells) if the treatment effect is large?

Rationale for ANOVA We can break the total variance in a study into meaningful

Rationale for ANOVA We can break the total variance in a study into meaningful pieces that correspond to treatment effects and error. That’s why we call this Analysis of Variance. Definitions of Terms Used in ANOVA: The Grand Mean, taken over all observations. The mean of any level of a treatment. The mean of a specific level (1 in this case) of a treatment. The observation or raw data for the ith person.

The ANOVA Model A treatment effect is the difference between the overall, grand mean,

The ANOVA Model A treatment effect is the difference between the overall, grand mean, and the mean of a cell (treatment level). Error is the difference between a score and a cell (treatment level) mean. The ANOVA Model: An individual’s The grand A treatment is + or IV effect + score mean Error

The ANOVA Model The grand mean The graph shows the terms in the equation.

The ANOVA Model The grand mean The graph shows the terms in the equation. There are three cells or levels in this study. The IV effect and error for the highest scoring cell is shown. A treatment or IV effect Error

ANOVA Calculations Sums of squares (squared deviations from the mean) tell the story of

ANOVA Calculations Sums of squares (squared deviations from the mean) tell the story of variance. The simple ANOVA designs have 3 sums of squares. The total sum of squares comes from the distance of all the scores from the grand mean. This is the total; it’s all you have. The within-group or within-cell sum of squares comes from the distance of the observations to the cell means. This indicates error. The between-cells or between-groups sum of squares tells of the distance of the cell means from the grand mean. This indicates IV effects.

Computational Example: Caffeine on Test Scores G 1: Control G 2: Mild G 3:

Computational Example: Caffeine on Test Scores G 1: Control G 2: Mild G 3: Jolt Test Scores 75=79 -4 80=84 -4 70=74 -4 77=79 -2 82=84 -2 72=74 -2 79=79+0 84=84+0 74=74+0 81=79+2 86=84+2 76=74+2 83=79+4 88=84+4 78=74+4 Means 79 84 74 SDs (N-1) 3. 16

Total Sum of Squares G 1 75 79 16 Control 77 79 4 M=79

Total Sum of Squares G 1 75 79 16 Control 77 79 4 M=79 79 79 0 SD=3. 16 81 79 4 83 79 16 G 2 80 79 1 M=84 82 79 9 SD=3. 16 84 79 25 86 79 49 88 79 81 G 3 70 79 81 M=74 72 79 49 SD=3. 16 74 79 25 76 79 9 78 79 1 Sum 370

In the total sum of squares, we are finding the squared distance from the

In the total sum of squares, we are finding the squared distance from the Grand Mean. If we took the average, we would have a variance.

Within Sum of Squares G 1 75 79 16 Control 77 79 4 M=79

Within Sum of Squares G 1 75 79 16 Control 77 79 4 M=79 79 79 0 SD=3. 16 81 79 4 83 79 16 G 2 80 84 16 M=84 82 84 4 SD=3. 16 84 84 0 86 84 4 88 84 16 G 3 70 74 16 M=74 72 74 4 SD=3. 16 74 74 0 76 74 4 78 74 16 Sum 120

Within sum of squares refers to the variance within cells. That is, the difference

Within sum of squares refers to the variance within cells. That is, the difference between scores and their cell means. SSW estimates error.

Between Sum of Squares G 1 79 79 0 Control 79 79 0 M=79

Between Sum of Squares G 1 79 79 0 Control 79 79 0 M=79 79 79 0 SD=3. 16 79 79 0 G 2 84 79 25 M=84 84 79 25 SD=3. 16 84 79 25 G 3 74 79 25 M=74 74 79 25 SD=3. 16 74 79 25 Sum 250

The between sum of squares relates the Cell Means to the Grand Mean. This

The between sum of squares relates the Cell Means to the Grand Mean. This is related to the variance of the means.

ANOVA Source Table (1) Source SS Between Groups 250 k-1=2 Within Groups Total df

ANOVA Source Table (1) Source SS Between Groups 250 k-1=2 Within Groups Total df MS SS/df 250/2= 125 =MSB 120 N-k= 120/12 = 15 -3=12 10 = MSW 370 N-1=14 F F= MSB/MSW = 125/10 =12. 5

ANOVA Source Table (2) n n n df – Degrees of freedom. Divide the

ANOVA Source Table (2) n n n df – Degrees of freedom. Divide the sum of squares by degrees of freedom to get MS, Mean Squares, which are population variance estimates. F is the ratio of two mean squares. F is another distribution like z and t. There are tables of F used for significance testing.

The F Distribution

The F Distribution

F Table – Critical Values Numerator df: df. B df. W 1 2 3

F Table – Critical Values Numerator df: df. B df. W 1 2 3 4 5 5 5% 1% 6. 61 16. 3 5. 79 13. 3 5. 41 12. 1 5. 19 11. 4 5. 05 11. 0 10 5% 1% 4. 96 10. 0 4. 10 7. 56 3. 71 6. 55 3. 48 5. 99 3. 33 5. 64 12 5% 1% 4. 75 9. 33 3. 89 6. 94 3. 49 5. 95 3. 26 5. 41 3. 11 5. 06 14 5% 1% 4. 60 8. 86 3. 74 6. 51 3. 34 5. 56 3. 11 5. 04 2. 96 4. 70

Review n n What are critical values of a statistics (e. g. , critical

Review n n What are critical values of a statistics (e. g. , critical values of F)? What are degrees of freedom? What are mean squares? What does MSW tell us?

Review 6 Steps 1. 2. Set alpha (. 05). State Null & Alternative H

Review 6 Steps 1. 2. Set alpha (. 05). State Null & Alternative H 0: H 1: not all are =. 3. Calculate test statistic: F=12. 5 4. 5. 6. Determine critical value F. 05(2, 12) = 3. 89 Decision rule: If test statistic > critical value, reject H 0. Decision: Test is significant (12. 5>3. 89). Means in population are different.

Post Hoc Tests n n If the t-test is significant, you have a difference

Post Hoc Tests n n If the t-test is significant, you have a difference in population means. If the F-test is significant, you have a difference in population means. But you don’t know where. With 3 means, could be A=B>C or A>B=C. We need a test to tell which means are different. Lots available, we will use 1.

Tukey HSD (1) Use with equal sample size per cell. HSD means honestly significant

Tukey HSD (1) Use with equal sample size per cell. HSD means honestly significant difference. is the Type I error rate (. 05). Is a value from a table of the studentized range statistic based on alpha, df. W (12 in our example) and k, the number of groups (3 in our example). Is the mean square within groups (10). Is the number of people in each group (5). MSW Result for our example. From table

Tukey HSD (2) To see which means are significantly different, we compare the observed

Tukey HSD (2) To see which means are significantly different, we compare the observed differences among our means to the critical value of the Tukey test. The differences are: 1 -2 is 79 -84 = -5 (say 5 to be positive). 1 -3 is 79 -74 = 5 2 -3 is 84 -74 = 10. Because 10 is larger than 5. 33, this result is significant (2 is different than 3). The other differences are not significant. Review 6 steps.

Review n n What is a post hoc test? What is its use? Describe

Review n n What is a post hoc test? What is its use? Describe the HSD test. What does HSD stand for?

Test Another name for mean square is _____. n 1. 2. 3. 4. standard

Test Another name for mean square is _____. n 1. 2. 3. 4. standard deviation sum of squares treatment level variance

Test When do we use post hoc tests? n a. after a significant overall

Test When do we use post hoc tests? n a. after a significant overall F test n b. after a nonsignificant overall F test n c. in place of an overall F test n d. when we want to determine the impact of different factors