Two Factor ANOVA Copyright c 2008 by The
Two Factor ANOVA Copyright (c) 2008 by The Mc. Graw-Hill Companies. This material is intended solely for educational purposes by licensed users of Learning. Stats. It may not be copied or resold for profit.
When Do We Need Two. Factor ANOVA? When we believe that the response variable Y is affected by more than one factor. Y = response variable A = first factor B = second factor Y = f(A, B) Model: A Y B
Two-Factor ANOVA (Randomized Block) Linear Model Form: Yjk = m + tj + fk + ejk Hypotheses Definitions H 0: tj = 0 (no treatment effect exists) Yij = data in treatment j and block k H 1: tj 0 (treatment effect exists) m = common mean tj = effect due to treatment j H 0: fk = 0 (no block effect exists) fk = effect due to block k H 1: fk 0 (block effect exists) ejk = random error Note This notation is for a fixed effects model. If tj = 0 and fk = 0, the model collapses to Yjk = m + ejk which says that each observed data value is the mean perturbed by some random error.
Data Format Example of 3 x 3 Format (two-factor unreplicated) Row-Column Format (Excel) Note: Often only one factor (treatment) is of research interest, while the other variable (block) serves only to control for a second factor. The calculations are the same, no matter how you view the design. Usually, the blocking factor is placed in the rows. Stacked Format (Minitab)
Example: Pollution
ANOVA Table: 2 -Factor General format: Illustration: Definitions c = number of columns r = number of rows n = number of observations If F exceeds Fcrit there is a significant difference between treatment groups at the chosen a.
Interpretation F Statistics: For freeway, F = 24. 903 exceeds Fcrit =3. 490 (for d. f. = 3, 12) so there is a significant difference between freeways at a = 0. 05. For time of day, F = 21. 506 exceeds Fcrit =3. 259 (for d. f. = 4, 12) so there is a significant difference between times of day at a = 0. 05. p-Values: Both p-values are 0. 000, which says that F statistics as large as these would not arise by chance at any common level of significance if the null hypothesis were true (i. e. , if the treatment means were the same). We could reject the hypothesis of equal group means even at a = 0. 001 Bottom Line: Both freeway and time of day affect the pollution level. Both effects are extremely significant.
Two-Factor ANOVA (replicated) Model Form Note aj and bk are called the main effects. Xijk = m + aj + bk + abjk+ eijk Hypotheses Definitions H 0: aj = 0 (no row effect exists) Xijk = ith obs. in row j and col. k H 1: aj 0 (row effect exists) m = common mean H 0: bk = 0 (no column effect exists) aj = effect due to row j H 1: bk 0 (column effect exists) H 0: abjk = 0 (no interaction exists) H 1: abjk 0 (interaction effect exists) bk = effect due to column k abjk = effect due to interaction eijk = random error
Two-Factor ANOVA (replicated) Example with 3 Groups Row-Column Format (Excel) For clarity, only two observations per cell are shown, but you can have as many as you want. Subscripts and symbols omitted for clarity. Stacked Format (Minitab)
Replicated: DVD Sales
ANOVA Table: Replicated General format: Illustration: Definitions c = number of columns r = number of rows m = number of observations per cell If F exceeds Fcrit there is a significant difference between treatment groups at the chosen a.
Interpretation F Statistics: Both main effects are significant difference at a = 0. 05 since the F statistics (200. 412 and 14. 054) exceed their critical values (4. 256 in both cases, using d. f. =2, 9). For the interaction effect, F = 2. 398 does not exceed the critical value Fcrit =3. 633 (for d. f. = 4, 9) at a = 0. 05. P-Values: For both main effects, the p-values of 0. 000 and 0. 002 are highly significant, but the interaction p-value of 0. 127 is significant at a = 0. 20 but not at a = 0. 10. Bottom Line: Both store size and display location affect weekly sales. Both effects are extremely significant, though store size is a stronger effect (smaller p-value). Interaction exists only at a very weak level of significance.
ANOVA Notation Textbooks and computer software use various symbols for treatments and subscripts. We follow Excel's practice of using r for rows and c for columns. In one-factor ANOVA, we use SSB for variation between columns and SSW for variation within columns. In two-factor ANOVA, we use SSA and SSB for the main effects and SSAB for the interaction.
Stacked Format Most computer packages expect the dependent variable to be in a column, and each factor to be in a column, like this: Obs 1 2 3 Y y 1 y 2 y 3 Factor 1 1 3 2 Factor 2 3 1 4 . . . n . . . yn . . . 1 . . . 3
Example: One-Factor Stacked Key Amt. Paid is the amount to be paid to provider by insurance carrier Prod. ID refers to the patient’s insurance type M=Medicare D=Medicaid S=Commercial 1 P=Commercial 2 Note These observations (n = 498) were chosen at random from a database containing 1, 508 records.
Minitab Procedure Copyright Notice Portions of MINITAB Statistical Software input and output contained in this document are printed with permission of Minitab, Inc. MINITAB TM is a trademark of Minitab Inc. in the United States and other countries and is used herein with the owner's permission.
Minitab Results: Hospital Charges P-value indicates no significant group difference at a = 0. 10. Minitab shows overlapping C. I. 's for each group mean.
Stacked or Unstacked? Comment In Excel, the one-factor ANOVA test requires that the data be grouped into separate columns. In this case, there are 4 insurance types so we would need 4 separate data columns. This is awkward for large data sets. Minitab can convert unstacked data into stacked data (and vice versa). In databases, the variables usually are coded in stacked format (one column for each factor).
Excel Procedure: Hospital Charges
Excel Results: Hospital Charges There are 4 insurance groups so d. f. =3 for the “treatment” effect. P-value would not even be significant at a = 0. 10. At the 5% level of significance, F does not exceed Fcrit.
More Than Two Factors? ANOVA can have any number of factors (main effects) and their interactions. For example: 3 -factor ANOVA model with all possible interactions: X = f (A, B, C, AB, AC, BC, ABC) 4 -factor ANOVA model with all possible interactions is: X = f (A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, BCD, ABCD) Samples sizes are rarely large enough to estimate such models, so higherorder interactions often are not examined.
Summary ANOVA compares means in several groups. Each group (or combination of factors) is a treatment. 1 -factor ANOVA is most common, comparing c groups. 2 -factor ANOVA without replication omits interactions. 2 -factor ANOVA with replication allows interactions. k-factor ANOVA is conceptually simple (but Excel doesn’t do it). One factor often suffices!
- Slides: 22