Slides Prepared by JueiChao Chen Fu Jen Catholic

Chapter 13 STATISTICS in PRACTICE n n Burke Marketing Services, Inc. , is one of the most experienced market research firms in the industry. In one study, a firm retained Burke to evaluate potential new versions of a children’s dry cereal. Analysis of variance was the statistical method used to study the data obtained from the taste tests. The experimental design employed by Burke and the subsequent analysis of variance were helpful in making a product design recommendation. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 2

Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis of Variance: Testing for the Equality of k Population Means n Multiple Comparison Procedures © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 3

Introduction to Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means. Data obtained from observational or experimental studies can be used for the analysis. We want to use the sample results to test the following hypotheses: H 0: 1 = 2 = 3 =. . . = k Ha: Not all population means are equal © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 4

Introduction to Analysis of Variance H 0: 1 = 2 = 3 =. . . = k Ha: Not all population means are equal If H 0 is rejected, we cannot conclude that all population means are different. Rejecting H 0 means that at least two population means have different values. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 5

Introduction to Analysis of Variance n Sampling Distribution of Given H 0 is True Sample means are close together because there is only one sampling distribution when H 0 is true. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 6

Introduction to Analysis of Variance n Sampling Distribution of Given H 0 is False Sample means come from different sampling distributions and are not as close together when H 0 is false. 3 1 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . 2 Slide 7

Assumptions for Analysis of Variance For each population, the response variable is normally distributed. The variance of the response variable, denoted 2, is the same for all of the populations. The observations must be independent. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 8

Analysis of Variance: Testing for the Equality of k Population Means n Between-Treatments Estimate of Population Variance n Within-Treatments Estimate of Population Variance n Comparing the Variance Estimates: The F Test n ANOVA Table © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 9

Analysis of Variance: Testing for the Equality of k Population Means n n Analysis of variance can be used to test for the equality of k population means. The hypotheses tested is H 0: Ha: Not all population means are equal where mean of the jth population. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 10

Analysis of Variance: Testing for the Equality of k Population Means n Sample data = value of observation i for treatment j = number of observations for treatment j = sample mean for treatment j = sample variance for treatment j = sample standard deviation for treatment j © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 11

Analysis of Variance: Testing for the Equality of k Population Means n Statisitcs The sample mean for treatment j n The sample variance for treatment j n © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 12

Analysis of Variance: Testing for the Equality of k Population Means n The overall sample mean n where n. T = n 1 + n 2 +. . . + nk If the size of each sample is n, n. T = kn then © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 13

Analysis of Variance: Testing for the Equality of k Population Means n Between-Treatments Estimate of Population Variance The sum of squares due to treatments (SSTR) n The mean square due to treatments (MSTR) n © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 14

Analysis of Variance: Testing for the Equality of k Population Means n Within-Treatments Estimate of Population Variance The sum of squares due to error (SSE) n The mean square due to error (MSE) n © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 15

Between-Treatments Estimate of Population Variance n A between-treatment estimate of 2 is called the mean square treatment and is denoted MSTR. Denominator represents the degrees of freedom associated with SSTR Numerator is the sum of squares due to treatments and is denoted SSTR © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 16

Within-Samples Estimate of Population Variance n The estimate of 2 based on the variation of the sample observations within each sample is called the mean square error and is denoted by MSE. Denominator represents the degrees of freedom associated with SSE Numerator is the sum of squares due to error and is denoted SSE © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 17

Comparing the Variance Estimates: The F Test n If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d. f. equal to k - 1 and MSE d. f. equal to n. T - k. n If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates 2. n Hence, we will reject H 0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 18

Test for the Equality of k Population Means n Hypotheses H 0: 1 = 2 = 3 =. . . = k Ha: Not all population means are equal n Test Statistic F = MSTR/MSE © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 19

Test for the Equality of k Population Means n Rejection Rule p-value Approach: Reject H 0 if p-value < Critical Value Approach: Reject H 0 if F > F where the value of F is based on an F distribution with k - 1 numerator d. f. and n. T - k denominator d. f. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 20

Sampling Distribution of MSTR/MSE n Rejection Region Sampling Distribution of MSTR/MSE Reject H 0 Do Not Reject H 0 MSTR/MSE F Critical Value © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 21

ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Squares Treatment Error Total SSTR SSE SST k– 1 n. T – k n. T - 1 MSTR MSE SST is partitioned into SSTR and SSE. F MSTR/MSE SST’s degrees of freedom (d. f. ) are partitioned into SSTR’s d. f. and SSE’s d. f. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 22

ANOVA Table SST divided by its degrees of freedom n. T – 1 is the overall sample variance that would be obtained if we treated the entire set of observations as one data set. With the entire data set as one sample, the formula for computing the total sum of squares, SST, is: © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 23

ANOVA Table ANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedom into their corresponding sources: treatments and error. Dividing the sum of squares by the appropriate degrees of freedom provides the variance estimates and the F value used to test the hypothesis of equal population means. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 24

Test for the Equality of k Population Means n n Example: National Computer Products, Inc. (NCP), manufactures printers and fax machines at plants located in Atlanta, Dallas, and Seattle. Object: To measure how much employees at these plants know about total quality management. A random sample of six employees was selected from each plant and given a quality awareness examination. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 25

Test for the Equality of k Population Means n Data n Let = mean examination score for population 1 = mean examination score for population 2 = mean examination score for population 3 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 26

Test for the Equality of k Population Means n Hypotheses H 0: = = Ha: Not all population means are equal n In this example 1. dependent or response variable : examination score 2. independent variable or factor : plant location 3. levels of the factor or treatments : the values of a factor selected for investigation, in the NCP example three treatments or three population are Atlanta, Dallas, and Seattle. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 27

Test for the Equality of k Population Means Three assumptions 1. For each population, the response variable is normally distributed. The examination scores (response variable) must be normally distributed at each plant. 2. The variance of the response variable, , is the same for all of the populations. The variance of examination scores must be the same for all three plants. 3. The observations must be independent. The examination score for each employee must be independent of the examination score for any other employee. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 28

Test for the Equality of k Population Means n ANOVA Table n p-value =

Test for the Equality of k Population Means n Example: Reed Manufacturing Janet Reed would like to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants (in Buffalo, Pittsburgh, and Detroit). © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 30

Test for the Equality of k Population Means n Example: Reed Manufacturing A simple random sample of five managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide. Conduct an F test using α =. 05. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 31

Test for the Equality of k Population Means Observation 1 2 3 4 5 Sample Mean Sample Variance Plant 1 Plant 3 57 54 62 Plant 2 Pittsburgh 73 63 66 64 74 55 26. 0 68 26. 5 57 24. 5 Buffalo 48 54 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Detroit 51 63 61 54 56 Slide 32

Test for the Equality of k Population Means n p -Value and Critical Value Approaches 1. Develop the hypotheses. H 0: 1 = 2 = 3 Ha: Not all the means are equal where: 1 = mean number of hours worked per week by the managers at Plant 1 2 = mean number of hours worked per week by the managers at Plant 2 3 = mean number of hours worked per week by the managers at Plant 3 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 33

Test for the Equality of k Population Means n p -Value and Critical Value Approaches 2. Specify the level of significance. =. 05 3. Compute the value of the test statistic. Mean Square Due to Treatments (Sample sizes are all equal. ) = (55 + 68 + 57)/3 = 60 SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490 MSTR = 490/(3 - 1) = 245 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 34

Test for the Equality of k Population Means n p -Value and Critical Value Approaches 3. Compute the value of the test statistic. (continued) Mean Square Due to Error SSE = 4(26. 0) + 4(26. 5) + 4(24. 5) = 308 MSE = 308/(15 - 3) = 25. 667 F = MSTR/MSE = 245/25. 667 = 9. 55 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 35

Test for the Equality of k Population Means n ANOVA Table Source of Sum of Variation Squares Treatment 490 308 Error 798 Total Degrees of Freedom 2 12 Mean Squares 245 25. 667 F 9. 55 14 © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 36

Test for the Equality of k Population Means n p - Value Approach 4. Compute the p –value. With 2 numerator d. f. and 12 denominator d. f. , the p-value is. 01 for F = 6. 93. Therefore, the p-value is less than. 01 for F = 9. 55. 5. Determine whether to reject H 0. The p-value <. 05, so we reject H 0. We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plant. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 37

Test for the Equality of k Population Means n Critical Value Approach 4. Determine the critical value and rejection rule. Based on an F distribution with 2 numerator d. f. and 12 denominator d. f. , F. 05 = 3. 89. Reject H 0 if F > 3. 89 5. Determine whether to reject H 0. Because F = 9. 55 > 3. 89, we reject H 0. We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plant. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 38

Test for the Equality of k Population Means n Summary © 2006 by Thomson

Multiple Comparison Procedures n n Suppose that analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significant difference (LSD) procedure can be used to determine where the differences occur. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 40

Fisher’s LSD Procedure n Hypotheses n Test Statistic © 2006 by Thomson Learning, a

Fisher’s LSD Procedure n Rejection Rule p-value Approach: Reject H 0 if p-value < a Critical Value Approach: Reject H 0 if t < -ta/2 or t > ta/2 where the value of ta/2 is based on a t distribution with n. T - k degrees of freedom. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 42

Fisher’s LSD Procedure Based on the Test Statistic xi - xj n Hypotheses n Test Statistic n Rejection Rule Reject H 0 if > LSD where © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 43

Fisher’s LSD Procedure Based on the Test Statistic xi - xj n Example: Reed Manufacturing Recall that Janet Reed wants to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants. Analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significant difference (LSD) procedure can be used to determine where the differences occur. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 44

Fisher’s LSD Procedure Based on the Test Statistic xi - xj For =. 05 and n. T - k = 15 – 3 = 12 degrees of freedom, t. 025 = 2. 179 MSE value was computed earlier © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 45

Fisher’s LSD Procedure Based on the Test Statistic xi - xj n LSD for Plants 1 and 2 • Hypotheses (A) • Rejection Rule • Reject H 0 if Test Statistic > 6. 98 = |55 - 68| = 13 • Conclusion The mean number of hours worked at Plant 1 is not equal to the mean number worked at Plant 2. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 46

Fisher’s LSD Procedure Based on the Test Statistic xi - xj n LSD for Plants 1 and 3 • Hypotheses (B) • • Rejection Rule Reject H 0 if > 6. 98 Test Statistic = |55 - 57| = 2 • Conclusion There is no significant difference between the mean number of hours worked at Plant 1 and the mean number of hours worked at Plant 3. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 47

Fisher’s LSD Procedure Based on the Test Statistic xi - xj n LSD for Plants 2 and 3 • Hypotheses (C) • • Rejection Rule Reject H 0 if > 6. 98 Test Statistic = |68 - 57| = 11 • Conclusion The mean number of hours worked at Plant 2 is not equal to the mean number worked at Plant 3. © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 48

Type I Error Rates n The comparisonwise Type I error rate indicates the level of significance associated with a single pairwise comparison. n The experimentwise Type I error rate EW is the probability of making a Type I error on at least one of the (k – 1)! pairwise comparisons. EW = 1 – (1 – )(k – 1)! n The experimentwise Type I error rate gets larger for problems with more populations (larger k). © 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd. . Slide 49