Dr Kafu Wong ECON 1003 Analysis of Economic

  • Slides: 50
Download presentation
Dr. Ka-fu Wong ECON 1003 Analysis of Economic Data Ka-fu Wong © 2003 Chap

Dr. Ka-fu Wong ECON 1003 Analysis of Economic Data Ka-fu Wong © 2003 Chap 10 - 1

Chapter Ten Analysis of Variance GOALS 1. 2. 3. 4. 5. 6. 7. 8.

Chapter Ten Analysis of Variance GOALS 1. 2. 3. 4. 5. 6. 7. 8. l Discuss the general idea of analysis of variance. List the characteristics of the F distribution. Conduct a test of hypothesis to determine whether the variances of two populations are equal. Organize data into a one-way and a two-way ANOVA table. Define and understand the terms treatments and blocks. Conduct a test of hypothesis among three or more treatment means. Develop confidence intervals for the difference between treatment means. Conduct a test of hypothesis to determine if there is a difference among block means. Ka-fu Wong © 2003 Chap 10 - 2

Two Sample Tests TEST FOR EQUAL VARIANCES Ho Population 1 TEST FOR EQUAL MEANS

Two Sample Tests TEST FOR EQUAL VARIANCES Ho Population 1 TEST FOR EQUAL MEANS Ho Population 2 H 1 Population 2 Ka-fu Wong © 2003 Population 1 H 1 Population 2 Chap 10 - 3

Characteristics of F-Distribution n There is a “family” of F Distributions. n Each member

Characteristics of F-Distribution n There is a “family” of F Distributions. n Each member of the family is determined by two parameters: the numerator degrees of freedom and the denominator degrees of freedom. n F cannot be negative, and it is a continuous distribution. n The F distribution is positively skewed. n Its values range from 0 to . As F the curve approaches the X-axis. Ka-fu Wong © 2003 Chap 10 - 4

The F-Distribution, F(m, n) Not symmetric (skewed to the right) Each member of the

The F-Distribution, F(m, n) Not symmetric (skewed to the right) Each member of the family is determined by two parameters: the numerator degrees of freedom (m) and the denominator degrees of freedom (n). a 0 1. 0 F Nonnegative values only Ka-fu Wong © 2003 Chap 10 - 5

Test for Equal Variances n For the two tail test, the test statistic is

Test for Equal Variances n For the two tail test, the test statistic is given by: where s 12 and s 22 are the sample variances for the two samples. n The null hypothesis is rejected at a level of significance if the computed value of the test statistic is greater than the critical value with a confidence level a/2 and numerator and denominator dfs. Ka-fu Wong © 2003 Chap 10 - 6

Test for Equal Variances n For the one tail test, the test statistic is

Test for Equal Variances n For the one tail test, the test statistic is given by: where s 12 and s 22 are the sample variances for the two samples. n The null hypothesis is rejected at a level of significance if the computed value of the test statistic is greater than the critical value with a confidence level a and numerator and denominator dfs. Ka-fu Wong © 2003 Chap 10 - 7

EXAMPLE 1 n Colin, a stockbroker at Critical Securities, reported that the mean rate

EXAMPLE 1 n Colin, a stockbroker at Critical Securities, reported that the mean rate of return on a sample of 10 internet stocks was 12. 6 percent with a standard deviation of 3. 9 percent. The mean rate of return on a sample of 8 utility stocks was 10. 9 percent with a standard deviation of 3. 5 percent. At the. 05 significance level, can Colin conclude that there is more variation in the software stocks? Ka-fu Wong © 2003 Chap 10 - 8

EXAMPLE 1 continued n Step 1: The hypotheses are: n Step 2: The significance

EXAMPLE 1 continued n Step 1: The hypotheses are: n Step 2: The significance level is. 05. n Step 3: The test statistic is the F distribution. n Step 4: H 0 is rejected if F>3. 68. The degrees of freedom are 9 in the numerator and 7 in the denominator. n Step 5: The value of F is H 0 is not rejected. There is insufficient evidence to show more variation in the internet stocks. Ka-fu Wong © 2003 Chap 10 - 9

Analysis of Variance (ANOVA) Ka-fu Wong © 2003 Chap 10 - 10

Analysis of Variance (ANOVA) Ka-fu Wong © 2003 Chap 10 - 10

Underlying Assumptions for ANOVA n The F distribution is also used for testing whether

Underlying Assumptions for ANOVA n The F distribution is also used for testing whether two or more sample means came from the same or equal populations. n if any group mean differs from the mean of all groups combined n Answers: “Are all groups equal or not? ” n This technique is called analysis of variance or ANOVA. n ANOVA requires the following conditions: n The sampled populations follow the normal distribution. n The populations have equal standard deviations. n The samples are randomly selected and are independent. Ka-fu Wong © 2003 Chap 10 - 11

The hypothesis n Suppose that we have independent samples of n 1, n 2,

The hypothesis n Suppose that we have independent samples of n 1, n 2, . . . , n. K observations from K populations. If the population means are denoted by 1, 2, . . . , K, the one-way analysis of variance framework is designed to test the null hypothesis Ka-fu Wong © 2003 Chap 10 - 12

Sample Observations from Independent Random Samples of K Populations Same !! unequal !! Population

Sample Observations from Independent Random Samples of K Populations Same !! unequal !! Population 1 2 . . . K Mean 1 2 . . . K Variance s 2 . . . s 2 Sample observations from the population x 11 x 12. . . x 1 n 1 x 22. . . x 2 n 2 . . x. K 1 x. K 2. . . x. Kn. K n 1 n 2 . . . n. K Sample size Unequal number of observations in the K samples in general. n. T=n 1+…+n. K Ka-fu Wong © 2003 Chap 10 - 13

Sum of Squares Decomposition for oneway analysis of variance n Suppose that we have

Sum of Squares Decomposition for oneway analysis of variance n Suppose that we have independent samples of n 1, n 2, . . . , n. K observations from K populations. n Denote by the K group sample means and by the overall sample mean. We define the following sum of squares: squares where xij denotes the jth sample observation in the ith group. Ka-fu Wong © 2003 Chap 10 - 14

An Numerical Example of Sum of Squares Decomposition Population 1 2 3 Mean 1

An Numerical Example of Sum of Squares Decomposition Population 1 2 3 Mean 1 2 K Variance s 2 s 2 Sample obs from the population (xij) 1 2 3 4 5 1 3 5 Sample size (nj) 3 4 3 Sample mean 2 3. 5 3 Grand mean Ka-fu Wong © 2003 2. 9 SSTotal = SST + SSE Chap 10 - 15

A proof of SSTotal = SST + SSE Populat ion 1 2 . .

A proof of SSTotal = SST + SSE Populat ion 1 2 . . . K Sample obs x 11 x 12. . . x 1 n 1 x 22. . . x 2 n 2 . . x. K 1 x. K 2. . . x. Kn. K n 1 n 2 . . . n. K Sample size Ka-fu Wong © 2003 Chap 10 - 16

Two Ways to estimate the population variance n Note that the variance is assumed

Two Ways to estimate the population variance n Note that the variance is assumed to be identical across populations n If the population means are identical, we have two ways to estimate the population variance n Based on the K sample variances. n Based on the deviation of the K sample means from the grand mean. Ka-fu Wong © 2003 Chap 10 - 17

An estimate the population variance based on sample variances n Anyone of the K

An estimate the population variance based on sample variances n Anyone of the K sample variances can be used to estimate the population. n We can get a more precise estimate if we use all the information from the K samples. Ka-fu Wong © 2003 Chap 10 - 18

An estimate the population variance based on deviation of the K sample means from

An estimate the population variance based on deviation of the K sample means from the grand sample mean. n If the sample sizes are the same for all samples, the Central Limit Theorem suggests that sample mean will be distributed normally with the population mean and the population variance divided by sample size. ? ? ? n When sample sizes are different across samples, we will have to weight Ka-fu Wong © 2003 Chap 10 - 19

Comparing the Variance Estimates: The F Test n If the null hypothesis is true

Comparing the Variance Estimates: The F Test n If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of ratio of the two variance estimates follows F distribution with K - 1 and n. T - K. n If the means of the K populations are not equal, the value of F-stat will be inflated because SST/(K-1) will overestimate s 2. n Hence, we will reject H 0 if the resulting value of F-stat appears to be too large to have been selected at random from the appropriate F distribution. Ka-fu Wong © 2003 Chap 10 - 20

Test for the Equality of k Population Means n Hypotheses H 0: 1= 2=

Test for the Equality of k Population Means n Hypotheses H 0: 1= 2= 4=…. = k H 1: Not all population means are equal n Test Statistic F = [SST/(K-1)] / [SSE/(n. T-K)] n Rejection Rule Reject H 0 if F > F where the value of F is based on an F distribution with k - 1 numerator degrees of freedom and n. T - K denominator degrees of freedom. Ka-fu Wong © 2003 Chap 10 - 21

Sampling Distribution of MST/MSE The figure below shows the rejection region associated with a

Sampling Distribution of MST/MSE The figure below shows the rejection region associated with a level of significance equal to where F denotes the critical value. Reject H 0 MST/MSE Do Not Reject H 0 F Critical Value Ka-fu Wong © 2003 Chap 10 - 22

The ANOVA Table Source of Variation Sum of Squares Degree of Freedom Mean Squares

The ANOVA Table Source of Variation Sum of Squares Degree of Freedom Mean Squares F Treatment SST K-1 MST/MSE Error SSE n. T-K MSE Total SSTotal n. T-1 Ka-fu Wong © 2003 Chap 10 - 23

Does learning method affect student’s exam scores? n Consider 3 methods: n standard n

Does learning method affect student’s exam scores? n Consider 3 methods: n standard n osmosis n shock therapy n Convince 15 students to take part. Assign 5 students randomly to each method. n Wait eight weeks. Then, test students to get exam scores. n Are three learning methods equally effective? n i. e. , are their population means of exam scores same? Ka-fu Wong © 2003 Chap 10 - 24

“Analysis of Variance” (Study #1) The variation between the group means and the grand

“Analysis of Variance” (Study #1) The variation between the group means and the grand mean is larger than the variation within each of the groups. Ka-fu Wong © 2003 Chap 10 - 25

ANOVA Table for Study #1 One-way Analysis of Variance Source Factor Error Total DF

ANOVA Table for Study #1 One-way Analysis of Variance Source Factor Error Total DF 2 12 14 SS 2510. 5 161. 2 2671. 7 “F” means “F test statistic” MS 1255. 3 13. 4 F 93. 44 P 0. 000 P-Value “Source” means “find the components of variation in this column” “DF” means “degrees of freedom” “SS” means “sums of squares” Ka-fu Wong © 2003 “MS” means “mean squared” Chap 10 - 26

ANOVA Table for Study #1 One-way Analysis of Variance Source Factor Error Total DF

ANOVA Table for Study #1 One-way Analysis of Variance Source Factor Error Total DF 2 12 14 SS 2510. 5 161. 2 2671. 7 MS 1255. 3 13. 4 F 93. 44 P 0. 000 “Factor” means “Variability between groups” or “Variability due to the factor of interest” “Error” means “Variability within groups” or “unexplained random variation” “Total” means “Total variation from the grand mean” Ka-fu Wong © 2003 Chap 10 - 27

ANOVA Table for Study #1 One-way Analysis of Variance Source Factor Error Total 14

ANOVA Table for Study #1 One-way Analysis of Variance Source Factor Error Total 14 = 2 + 12 DF 2 12 14 SS 2510. 5 161. 2 2671. 7 F 93. 44 P 0. 000 1255. 2 = 2510. 5/2 13. 4 = 161. 2/12 2671. 7 = 2510. 5 + 161. 2 Ka-fu Wong © 2003 MS 1255. 3 13. 4 93. 44 = 1255. 3/13. 4 Chap 10 - 28

“Analysis of Variance” (Study #2) The variation between the group means and the grand

“Analysis of Variance” (Study #2) The variation between the group means and the grand mean is smaller than the variation within each of the groups. Ka-fu Wong © 2003 Chap 10 - 29

ANOVA Table for Study #2 One-way Analysis of Variance Source Factor Error Total DF

ANOVA Table for Study #2 One-way Analysis of Variance Source Factor Error Total DF 2 12 14 SS 80. 1 1050. 8 1130. 9 MS 40. 1 87. 6 F 0. 46 P 0. 643 The P-value is pretty large so cannot reject the null hypothesis. There is insufficient evidence to conclude that the average exam scores differ for the three learning methods. Ka-fu Wong © 2003 Chap 10 - 30

Do Holocaust survivors have more sleep problems than others? Ka-fu Wong © 2003 Chap

Do Holocaust survivors have more sleep problems than others? Ka-fu Wong © 2003 Chap 10 - 31

ANOVA Table for Sleep Study One-way Analysis of Variance Source Factor Error Total DF

ANOVA Table for Sleep Study One-way Analysis of Variance Source Factor Error Total DF 2 117 119 SS 1723. 8 1634. 8 3358. 6 MS 861. 9 14. 0 F 61. 69 P 0. 000 The P-value is so small that we reject the null hypothesis of equal population means and favor the alternative hypothesis that at least one pair of population means are different. Ka-fu Wong © 2003 Chap 10 - 32

Potential problem with the analysis n What is driving the rejection of null of

Potential problem with the analysis n What is driving the rejection of null of equal population means? n From the plot, the Healthy and Depress seem to have different mean sleep quality. It looks like that the rejection is due to the difference between these two groups. n If we pooled Healthy and Depress, the distribution will look more like Survivor. That is, an acceptance of the null is more likely. n This example illustratse that we have to be careful about our analysis and interpretation of the result when we conduct a test of equal population means. Ka-fu Wong © 2003 Chap 10 - 33

EXAMPLE 2 n Rosenbaum Restaurants specialize in meals for senior citizens. Katy Polsby, President,

EXAMPLE 2 n Rosenbaum Restaurants specialize in meals for senior citizens. Katy Polsby, President, recently developed a new meat loaf dinner. Before making it a part of the regular menu she decides to test it in several of her restaurants. She would like to know if there is a difference in the mean number of dinners sold per day at the Anyor, Loris, and Lander restaurants. Use the. 05 significance level. Ka-fu Wong © 2003 Chap 10 - 34

Example 2 Obs 1 2 3 4 5 Ka-fu Wong © 2003 continued #

Example 2 Obs 1 2 3 4 5 Ka-fu Wong © 2003 continued # of dinners sold per day Aynor Loris Lander 13 10 18 12 12 16 14 12 13 11 17 17 17 Chap 10 - 35

EXAMPLE 2 continued n Step 1: H 0: 1 = 2 = 3 H

EXAMPLE 2 continued n Step 1: H 0: 1 = 2 = 3 H 1: Treatment means are not the same n Step 2: H 0 is rejected if F>4. 10. There are 2 df in the numerator and 10 df in the denominator. Ka-fu Wong © 2003 Chap 10 - 36

Example 2 continued n To find the value of F: Source SS df MS

Example 2 continued n To find the value of F: Source SS df MS F p-value 76. 25 2 38. 125 39. 10 1. 87 E-05 Error 9. 75 10 0. 975 Total 86. 00 Treatment 12       n The decision is to reject the null hypothesis. n The treatment means are not the same. n The mean number of meals sold at the three locations is not the same. Ka-fu Wong © 2003 Chap 10 - 37

Inferences About Treatment Means n When we reject the null hypothesis that the means

Inferences About Treatment Means n When we reject the null hypothesis that the means are equal, we may want to know which treatment means differ. n One of the simplest procedures is through the use of confidence intervals. Ka-fu Wong © 2003 Chap 10 - 38

Confidence Interval for the Difference Between Two Means n where t is obtained from

Confidence Interval for the Difference Between Two Means n where t is obtained from the t table with degrees of freedom (n. T - k). n MSE = [SSE/(n. T - k)] because Ka-fu Wong © 2003 Chap 10 - 39

EXAMPLE 3 n From EXAMPLE 2 develop a 95% confidence interval for the difference

EXAMPLE 3 n From EXAMPLE 2 develop a 95% confidence interval for the difference in the mean number of meat loaf dinners sold in Lander and Aynor. Can Katy conclude that there is a difference between the two restaurants? n Because zero is not in the interval, we conclude that this pair of means differs. n The mean number of meals sold in Aynor is different from Lander. Ka-fu Wong © 2003 Chap 10 - 40

Two-Factor ANOVA n For the two-factor ANOVA we test whethere is a significant difference

Two-Factor ANOVA n For the two-factor ANOVA we test whethere is a significant difference between the treatment effect and whethere is a difference in the blocking effect. Ka-fu Wong © 2003 Chap 10 - 41

Sample Observations from Independent Random Samples of K Populations TREATMENT B L O C

Sample Observations from Independent Random Samples of K Populations TREATMENT B L O C K Ka-fu Wong © 2003 1 2 . . . K 1 x 11 x 21 . . . x. K 1 2 x 12 x 22 . . . x. K 2 . . B x 1 B x 2 B . . . x. KB Chap 10 - 42

Sum of Squares Decomposition for Two-Way Analysis of Variance n Suppose that we have

Sum of Squares Decomposition for Two-Way Analysis of Variance n Suppose that we have a sample of observations with xij denoting the observation in the ith group and jth block. Suppose that there are K groups and B blocks, for a total of n = KH observations. Denote the group sample means by , n the block sample means by and the overall sample mean by x. SSTotal = SSE+SST+SSB Ka-fu Wong © 2003 Chap 10 - 43

General Format of Two-Way Analysis of Variance Table Source of Variation Sums of Squares

General Format of Two-Way Analysis of Variance Table Source of Variation Sums of Squares Degrees of Freedom Mean Squares F Ratios Treatments SST K-1 MST=SST/K-1) MST/MSE Blocks SSB B-1 MSB=SSB/(B-1) MSB/MSE Error SSE (K-1)(B 1) MSE=SSE/[(K-1)(B 1)] Total SSTotal n. T-1 Ka-fu Wong © 2003 Chap 10 - 44

EXAMPLE 4 n The Bieber Manufacturing Co. operates 24 hours a day, five days

EXAMPLE 4 n The Bieber Manufacturing Co. operates 24 hours a day, five days a week. The workers rotate shifts each week. Todd Bieber, the owner, is interested in whethere is a difference in the number of units produced when the employees work on various shifts. A sample of five workers is selected and their output recorded on each shift. At the. 05 significance level, can we conclude there is a difference in the mean production by shift and in the mean production by employee? Ka-fu Wong © 2003 Chap 10 - 45

EXAMPLE 4 Ka-fu Wong © 2003 continued Chap 10 - 46

EXAMPLE 4 Ka-fu Wong © 2003 continued Chap 10 - 46

EXAMPLE 4 continued n TREATMENT EFFECT n Step 1: H 0: µ 1= µ

EXAMPLE 4 continued n TREATMENT EFFECT n Step 1: H 0: µ 1= µ 2= µ 3 versus H 1: Not all means are equal. n Step 2: H 0 is rejected if F>4. 46, the degrees of freedom are 2 and 8. Ka-fu Wong © 2003 Chap 10 - 47

Example 4 continued n Step 3: Compute the various sum of squares: Source SS

Example 4 continued n Step 3: Compute the various sum of squares: Source SS df MS F p-value Treatments 62. 53 2 31. 267 5. 75 . 0283 Blocks 33. 73 4 8. 433 1. 55 . 2762 Error 43. 47 8 5. 433 Total 139. 73 14       n Step 4: H 0 is rejected. There is a difference in the mean number of units produced for the different time periods. Ka-fu Wong © 2003 Chap 10 - 48

EXAMPLE 4 continued n Block Effect: n Step 1: H 0: µ 1= µ

EXAMPLE 4 continued n Block Effect: n Step 1: H 0: µ 1= µ 2= µ 3 = µ 4 = µ 5 versus H 1: Not all means are equal. n Step 2: H 0 is rejected if F>3. 84, the degrees of freedom are 4 and 8. n Step 3: F=[33. 73/4]/[43. 47/8]=1. 55 n Step 4: H 0 is not rejected since there is no significant difference in the average number of units produced for the different employees. Ka-fu Wong © 2003 Chap 10 - 49

Chapter Ten Analysis of Variance - END - Ka-fu Wong © 2003 Chap 10

Chapter Ten Analysis of Variance - END - Ka-fu Wong © 2003 Chap 10 - 50