CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data 11. 2 Inference for Two-Way Tables The Practice of Statistics, 5 th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers
Comparing Distributions of a Categorical Variable Market researchers suspect that background music may affect the mood and buying behavior of customers. One study in a Mediterranean restaurant compared three randomly assigned treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the numbers of customers who ordered French, Italian, and other entrees. The Practice of Statistics, 5 th Edition 2
Expected Counts and the Chi-Square Statistic Finding Expected Counts When H 0 is true, the expected count in any cell of a two-way table is Conditions for Performing a Chi-Square Test for Homogeneity • Random: The data come a well-designed random sample or from a randomized experiment. o 10%: When sampling without replacement, check that n ≤ (1/10)N. • Large Counts: All expected counts are greater than 5 The Practice of Statistics, 5 th Edition 3
Expected Counts and the Chi-Square Statistic Just as we did with the chi-square goodness-of-fit test, we compare the observed counts with the expected counts using the statistic This time, the sum is over all cells (not including the totals!) in the twoway table. The Practice of Statistics, 5 th Edition 4
Chi-Square Test for Homogeneity Suppose the conditions are met. You can use the chi-square test for homogeneity to test H 0: There is no difference in the distribution of a categorical variable for several populations or treatments. Ha: There is a difference in the distribution of a categorical variable for several populations or treatments. Start by finding the expected count for each category assuming that H 0 is true. Then calculate the chi-square statistic where the sum is over all cells (not including totals) in the two-way table. If H 0 is true, the c 2 statistic has approximately a chi-square distribution with degrees of freedom = (number of rows − 1)(number of columns − 1). The P-value is the area to the right of c 2 under the corresponding chi -square density curve. The Practice of Statistics, 5 th Edition 5
The Practice of Statistics, 5 th Edition 6
The Chi-Square Test for Independence The 10% and Large Counts conditions for the chi-square test for independence are the same as for the homogeneity test. There is a slight difference in the Random condition for the two tests: a test for independence uses data from one sample but a test for homogeneity uses data from two or more samples/groups. Conditions for Performing a Chi-Square Test for Independence • Random: The data come a well-designed random sample or from a randomized experiment. o 10%: When sampling without replacement, check that n ≤ (1/10)N. • Large Counts: All expected counts are greater than 5 The Practice of Statistics, 5 th Edition 7
Chi-Square Test for Independence Suppose the conditions are met. You can use the chi-square test for independence to test H 0: There is no association between two categorical variables in the population of interest. Ha: There is an association between two categorical variables in the population of interest. Start by finding the expected count for each category assuming that H 0 is true. Then calculate the chi-square statistic where the sum is over all cells (not including totals) in the two-way table. If H 0 is true, the c 2 statistic has approximately a chi-square distribution with degrees of freedom = (number of rows − 1)(number of columns − 1). The P-value is the area to the right of c 2 under the corresponding chi -square density curve. The Practice of Statistics, 5 th Edition 8
Example: Choosing the right type of chi-square test Are men and women equally likely to suffer lingering fear from watching scary movies as children? Researchers asked a random sample of 117 college students to write narrative accounts of their exposure to scary movies before the age of 13. More than one-fourth of the students said that some of the fright symptoms are still present when they are awake. The following table breaks down these results by gender. The Practice of Statistics, 5 th Edition 9
Example: Choosing the right type of chi-square test Minitab output for a chi-square test using these data is shown below. The Practice of Statistics, 5 th Edition 10
Example: Choosing the right type of chi-square test Problem: Assume that the conditions for performing inference are met. (a) Explain why a chi-square test for independence and not a chi-square test for homogeneity should be used in this setting. The data were produced using a single random sample of college students, who were then classified by gender and whether or not they had lingering fright symptoms. The chi-square test for homogeneity requires independent random samples from each population. The Practice of Statistics, 5 th Edition 11
Example: Choosing the right type of chi-square test Problem: Assume that the conditions for performing inference are met. (b) State an appropriate pair of hypotheses for researchers to test in this setting. The null hypothesis is H 0: There is no association between gender and ongoing fright symptoms in the population of college students. The alternative hypothesis is Ha: There is an association between gender and ongoing fright symptoms in the population of college students. The Practice of Statistics, 5 th Edition 12
Example: Choosing the right type of chi-square test Problem: Assume that the conditions for performing inference are met. (c) Which cell contributes most to the chi-square statistic? In what way does this cell differ from what the null hypothesis suggests? Men who admit to having lingering fright symptoms account for the largest component of the chi-square statistic: 1. 883 of the total 4. 028. Far fewer men in the sample admitted to fright symptoms (7) than we would expect if H 0 were true (11. 69). The Practice of Statistics, 5 th Edition 13
Example: Choosing the right type of chi-square test Problem: Assume that the conditions for performing inference are met. (d) Interpret the P-value in context. What conclusion would you draw at α = 0. 01? If gender and ongoing fright symptoms really are independent in the population of interest, there is a 0. 045 chance of obtaining a random sample of 117 students that gives a chi-square statistic of 4. 028 or higher. Because the P-value, 0. 045, is greater than 0. 01, we would fail to reject H 0. We do not have convincing evidence that there is an association between gender and fright symptoms in the population of college students. The Practice of Statistics, 5 th Edition 14
Inference for Two-Way Tables Section Summary In this section, we learned how to… ü COMPARE conditional distributions for data in a two-way table. ü STATE appropriate hypotheses and COMPUTE expected counts for a chi-square test based on data in a two-way table. ü CALCULATE the chi-square statistic, degrees of freedom, and P-value for a chi-square test based on data in a two-way table. ü PERFORM a chi-square test for homogeneity. ü PERFORM a chi-square test for independence. ü CHOOSE the appropriate chi-square test. The Practice of Statistics, 5 th Edition 15
- Slides: 15