ANALYSIS OF VARIANCE ANOVA Heibatollah Baghi and Mastee
ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii 1
Purpose of ANOVA • Use one-way Analysis of Variance to test when the mean of a variable (Dependent variable) differs among three or more groups – For example, compare whether systolic blood pressure differs between a control group and two treatment groups 2
Purpose of ANOVA • One-way ANOVA compares three or more groups defined by a single factor. – For example, you might compare control, with drug treatment plus antagonist. Or might compare control with five different treatments. • Some experiments involve more than one factor. These data need to be analyzed by two-way ANOVA or Factorial ANOVA. – For example, you might compare the effects of three different drugs administered at two times. There are two factors in that experiment: Drug treatment and time. 3
Why not do repeated t-tests? • Rather than using one-way ANOVA, you might be tempted to use a series of t tests, comparing two groups each time. Don’t do it. • Repeated t-test increase the chances of type I error or multiple comparison problem • If you are making comparison between 5 groups, you will need 10 comparison of means • When the null hypothesis is true the probability that at least 1 of the 10 observed significance levels is less than 0. 05 is about 0. 29 4
Why not do repeated t-tests? • With 10 means (45 comparisons), the probability of finding at least one significant difference is about 0. 63 • In other words, when level of significance is. 05, there is a 1 in 20 chance that one t-test will yield a significant result even when the null hypothesis is true. • The more t-test the more that probability will increase 5
What Does ANOVA Do? • ANOVA involves the partitioning of variance of the dependent variable into different components: – A. Between Group Variability – B. Within Group Variability • More Specifically, The Analysis of Variance is a method for partitioning the Total Sum of Squares into two Additive and independent parts. 6
Definition of Total Sum of Squares or Variance Group 2 … Group p 1 X 11 X 21 … Xp 1 2 X 12 X 22 … Xp 2 3 X 13 X 23 … Xp 3 … … . . X 2 n . . Xpn … n X 1 n Grand average Case Group 1 Summed across all n times p observations 7
Definition of Between Sum of Squares Group 2 … 1 X 11 X 21 … 2 X 12 X 22 … 3 X 13 X 23 … … … X 2 n . . … n X 1 n Sum of squared differences of group means from the grand mean is SSB Gra n d a v e r a g e Case Group 1 Group p Av X e. Xr gr X ag ou e o. . p f X j p 1 p 2 p 3 pn . j 8
Definition of Within Sum of Squares … Group p 1 X 11 X 21 … Xp 1 2 X 12 X 22 … Xp 2 3 X 13 X 23 … Xp 3 … … . . X 2 n . . Xpn … n X 1 n G Group 2 O Sum of squared difference of observations from group means bs er va tio ns roup m e a n Case Group 1 9
Partitioning of Variance into Different Components Total sum of squares Between groups sum of squares Within groups sum of squares 10
Test Statistic in ANOVA Test statistic for ANOVA is based on between & within groups SS 11
Test Statistic in ANOVA • F = Between group variability / Within group variability – The source of Within group variability is the individual differences. – The source of Between group variability is effect of independent or grouping variables. – Within group variability is sampling error across the cases – Between group variability is effect of independent groups or variables 12
Steps in Test of Hypothesis 1. Determine the appropriate test 2. Establish the level of significance: α 3. Determine whether to use a one tail or two tail test Same as Before 4. Calculate the test statistic 5. Determine the degree of freedom 6. Compare computed test statistic against a tabled/critical value 13
1. Determine the Appropriate Test • Independent random samples have been taken from each population • Dependent variable population are normally distributed (ANOVA is robust with regards to this assumption) • Population variances are equal (ANOVA is robust with regards to this assumption) • Subjects in each group have been independently sampled 14
2. Establish Level of Significance • α is a predetermined value • The convention • α =. 05 • α =. 01 • α =. 001 15
3. Use a Two Tailed Test • H o: 1 = 2 = 3 = 4 Where • • 1 = population mean for group 1 2 = population mean for group 2 3 = population mean for group 3 4 = population mean for group 4 • H 1 = not Ho 16
3. Use a Two Tailed Test • Ha = not Ho • The alternative hypothesis does not specify whether – 1 2 or – 2 3 or – 1 3 17
Degrees m o d e e r f of within Degrees m o d e e r f f um of o. S squabreetwweitehn in Sum of square betw een 4. Calculating Test Statistics • F = (SSb / df. B) / (SSw / dfw) 18
4. Calculating Test Statistics • By dividing the sum of the squared deviations by degrees of freedom, we are essentially computing an “average” (or mean) amount of variation • The specific name for the numerator of the F statistic is the mean square between (the average amount of between-group variation • The specific name for the denominator of the F statistic is the mean square within (the average amount of within- group variation) 19
5. Determine Degrees of Freedom • Degrees of freedom between – df. B = k – 1 – K = number of groups • Degrees of freedom within – dfw = N – k – N = total number of subjects in the study 20
6. Compare the Computed Test Statistic Against a Tabled Value • α =. 05 • If Fc > Fα Reject H 0 • If Fc > Fα Can not Reject H 0 21
Example • Suppose we had patients with myocardial infarction in the following groups: – Group 1: A music therapy group – Group 2: A relaxation therapy group – Group 3: A control group • 15 patients are randomly assigned to the 3 groups and then their stress levels are measured to determine if the interventions were effective in minimizing stress. 22
Example • Dependent Variable – The stress scores. The ranges are from zero (no stress) to 10 (extreme stress) • Independent Variable or Factor – Treatment Conditions(3 levels) 23
Observations 24
Sum of Squares for Each Group 1 Group 2 Group 3 0 6 1 4 5 6 2 3 10 4 2 8 3 0 6 SS 1 = 20 SS 2 = 10 n 1=5 n 2= 5 SS 3= 16 n 3 = 5 25
SS Within 26
SS Between Group 1 average Number of cases Group 2 average Group 3 average Grand average 27
Sum of Squares Total 28
Components of Variance SSTotal = SSBetween + SSWithin 116 = 70 + 46 29
Degrees of Freedom • Df between = 3 -1 • Df within = 15 - 3 df. B = k – 1 dfw = N – k 30
Test Statistic MSBetween= 70 / 2 = 35 MSWithin= 46 / 12 = 3. 83 Fc = MSBetween / MSWithin Fc = 35 / 3. 83 = 9. 13 31
Lookup Critical Value • Fα = 3. 88 32
Conclusions • Fc = 9. 13 > Fα = 3. 88 • Fc > Fα Therefore Reject H 0 33
One-way ANOVA Summary Source SS DF MS -------Between -----70 -------2 35 ------9. 13 3. 88 Within ------Total 46 -----116 12 ---14 ----- 3. 83 ----- Fc Fα ------- 34
Multiple Comparison Groups F test does not tell which pair are not equal Additional analysis is necessary to answer which pair are not equal 35
Fisher’s LSD Test • These are the null and alternative hypothesis being tested – Ho 1 : µ 1 = µ 2 Ha 1 : µ 1 µ 2 – Ho 2 : µ 1 = µ 3 Ha 2 : µ 1 µ 3 – Ho 3 : µ 2 = µ 3 Ha 3 : µ 2 µ 3 36
Fisher’s LSD Test • Known as the protected t-test • The least difference between means needed for significance • Df = N – K • Use the following formula: 37
Calculation of LSD • All pairs for means differing by at least 2. 70 points on the stress scale would be significantly different from on another. 38
Application to Three Samples Mean 1 – Mean 2 = 1 Mean 3 – Mean 1 = 4 Mean 3 – Mean 2 = 5 Alternative Hypotheses: Ho 1 : µ 1 = µ 2 Not Rejected Ho 2 : µ 1 = µ 3 Rejected Ho 3 : µ 2 = µ 3 Rejected 39
Use of SPSS in ANOVA 40
Stress Score Data in SPSS Input Format Groups 0 1 6 1 2 1 4 1 3 1 1 2 4 2 3 2 2 2 0 2 5 3 6 3 10 3 8 3 6 3 41
SPSS Output for ANOVA Descriptives Stress Levels N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Lower Bound Music Therapy Relaxation Therapy Control Group Total Minimum Maximum Upper Bound 5 3. 00 2. 236 1. 000 . 22 5. 78 0 6 5 2. 00 1. 581 . 707 . 04 3. 96 0 4 5 7. 00 2. 000 . 894 4. 52 9. 48 5 10 15 4. 00 2. 878 . 743 2. 41 5. 59 0 10 42
SPSS Output for ANOVA Test of Homogeneity of Variances Stress Levels. Levene Statistic . 242 df 1 Sig level or p-value df 2 2 12 . 788 P >. 05, therefore, th assumption of Homogeneity of Variance is met. ANOVA Stress Levels Sum of Squares df Mean Square Between Groups 70. 000 2 35. 000 Within Groups 46. 000 12 3. 833 116. 000 14 Total F 9. 130 Sig. level or p-value . 004 P<. 05, therefore, we reject the Null Hypothesis and continue with Multiple Comparison Table 43
SPSS Output for ANOVA Multiple Comparisons Dependent Variable: Stress Levels LSD (I) Groups (J) Groups Music Therapy Relaxation Therapy Control Group Std. Error Sig. Level 95% Confidence Interval 1. 000 1. 238 . 435 -1. 70 3. 70 -4. 000(*) 1. 238 . 007 -6. 70 -1. 30 -1. 000 1. 238 . 435 -3. 70 1. 70 Control Group -5. 000(*) 1. 238 . 002 -7. 70 -2. 30 Music Therapy 4. 000(*) 1. 238 . 007 1. 30 6. 70 5. 000(*) 1. 238 . 002 2. 30 7. 70 Control Group Relaxation Therapy Mean Difference (I-J) Music Therapy Relaxation Therapy 44 * The mean difference is significant at the. 05 level.
Take home lesson How to compare means of three or more samples 45
- Slides: 45