AP Stats Check In Where weve been Chapter
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! – – – – Ch 9 Tests about a population proportion Ch 9 Tests about a population mean Ch 10 Tests about 2 population proportions Ch 10 Tests about 2 population means Ch 11 Tests for Goodness of Fit (chi-square) Ch 11 Test for Homogeneity (chi-square) Ch 11 Test for Independence (chi-square) Ch 12…Linear Regression
Significance Tests: Chi-Square Test for Independence …Test for Association Section 11. 2 (PART 2) Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates, Moore
Objectives Conducting a Chi-Square Significance Test! Test for Independence State H 0: Two categorical variables are independent in the population of interest Ha: Two categorical variables are not independent in the population of interest Plan Random: does the data come from 2 independent random samples? Large Sample Size: Expected count must be greater or equal to 5 Independent: 10% condition Do Conclude Based on a P-value of. 001 which is less than α = 0. 05, we reject H 0. We have statistically significant evidence to conclude that anger level and heart disease are not independent in the population of people with normal blood pressure.
Lets turn to the board to run through our 8 th hypothesis test!
Going from GOF to Homogeneity When we conduct a Chi-square GOF , we are comparing the distribution of a categorical variable from one sample. However, in this section, we are comparing the distributions of a categorical variable from several samples (or groups) with each other. Deciding on Homogeneity or Independence? Both the chi-square test for homogeneity and the chi-square test for association/independence start with a two-way table of observed counts. They even calculate the test statistic, degrees of freedom, and P-value in the same way. The questions that these two tests answer are different, however.
Deciding on Homogeneity or Independence? A chi-square test for homogeneity tests whether the distribution of a categorical variable is the same for each of several populations or treatments. The chi-square test for association /independence tests whether two categorical variables are associated/independent in some population of interest.
Deciding on Homogeneity or Independence? Instead of focusing on the question asked, it’s much easier to look at how the data were produced. • If the data come from two or more independent random samples or treatment groups in a randomized experiment, then do a chi-square test for homogeneity. • If the data come from a single random sample, with the individuals classified according to two categorical variables, use a chi-square test for association/independence.
Introduction We can decide whether the distribution of a categorical variable differs for two or more populations or treatments using a chisquare test for homogeneity. In doing so, we will often organize our data in a two-way table. It is also possible to use the information in a two-way table to study the relationship between two categorical variables. The chi-square test for association/independence allows us to determine if there is convincing evidence of an association between the variables in the population at large. Chi-Square Goodness-of-Fit Tests In the previous chapter, we discussed inference procedures for comparing the proportion of successes for two populations or treatments. Sometimes we want to examine the distribution of a single categorical variable in a population. The chi-square goodness-of-fit test allows us to determine whether a hypothesized distribution seems valid.
Calculator: •
• The Chi-Square Test for Association/Independence If the Random, Large Sample Size, and conditions are met, Suppose the Random, Sample Size, Independent and Independent conditions are 2 the You χ statistic calculated from a two-way table can be used to performtoa test met. can use the chi-square test for association/independence test of P-values for this test come from a chi-square distribution with df = (number Or, of alternatively rows - 1)(number of columns - 1). H : Two categorical variables are independent in the population of interest. 0 This known are as not a chi-square for Hanew : Twoprocedure categorical is variables independenttest in the population of interest. association/independence. Start by finding the expected counts. Then calculate the chi-square statistic where the sum is over all cells (not including totals) in the two-way table. If H 0 is true, the χ2 statistic has approximately a chi-square distribution with degrees of freedom = (number of rows – 1) (number of columns - 1). The P-value is the area to the right of χ2 under the corresponding chi-square density curve. Inference for Relationships H 0: There is no association between two categorical variables in the population of H 0: There is no association between two categorical variables in the interest. Hapopulation : There is anofassociation interest. between two categorical variables in the population of interest.
• The Chi-Square Test for Association/Independence We often gather data from a random sample and arrange them in a twoway table to see if two categorical variables are associated. The sample data are easy to investigate: turn them into percents and look for a relationship between the variables. H 0: There is no association between anger level and heart disease in the population of people with normal blood pressure. Ha: There is an association between anger level and heart disease in the population of people with normal blood pressure. Inference for Relationships Our null hypothesis is that there is no association between the two categorical variables. The alternative hypothesis is that there is an association between the variables. For the observational study of anger level and coronary heart disease, we want to test the hypotheses No association between two variables means that the values of one variable do not tend to occur in common with values of the other. That is, the variables are independent. An equivalent way to state the hypotheses is therefore H 0: Anger and heart disease are independent in the population of people with normal blood pressure. Ha: Anger and heart disease are not independent in the population of people with normal blood pressure.
• Example: Angry People and Heart Disease Here is the complete table of observed and expected counts for the CHD and anger study side by side. Do the data provide convincing evidence of an association between anger level and heart disease in the population of interest? H 0: There is no association between anger level and heart disease in the population of people with normal blood pressure. Ha: There is an association between anger level and heart disease in the population of people with normal blood pressure. We will use α = 0. 05. Inference for Relationships State: We want to perform a test of
• Example: Angry People and Heart Disease Plan: If the conditions are met, we should conduct a chi-square test for association/independence. Inference for Relationships • Random The data came from a random sample of 8474 people with normal blood pressure. • Large Sample Size All the expected counts are at least 5, so this condition is met. • Independent Knowing the values of both variables for one person in the study gives us no meaningful information about the values of the variables for another person. So individual observations are independent. Because we are sampling without replacement, we need to check that the total number of people in the population with normal blood pressure is at least 10(8474) = 84, 740. This seems reasonable to assume.
• Example: Angry People and Heart Disease Do: Since the conditions are satisfied, we can perform a chi-test for association/independence. We begin by calculating the test statistic. Table: Look at the df = 2 line in Table C. The observed statistic χ2 = 16. 077 is larger than the critical value 15. 20 for α = 0. 0005. So the P-value is less than 0. 0005. Technology: The command χ2 cdf(16. 077, 1000, 2) gives 0. 00032. Inference for Relationships P-Value: The two-way table of anger level versus heart disease has 2 rows and 3 columns. We will use the chi-square distribution with df = (2 - 1)(3 - 1) = 2 to find the P-value. Conclude: Because the P-value is clearly less than α = 0. 05, we reject H 0 and conclude that anger level and heart disease are associated in the population of people with normal blood pressure.
• Using Chi-Square Tests Wisely Both the chi-square test for homogeneity and the chi-square test for association/independence start with a two-way table of observed counts. They even calculate the test statistic, degrees of freedom, and P-value in the same way. The questions that these two tests answer are different, however. §The chi-square test for association/independence tests whether two categorical variables are associated in some population of interest. Inference for Relationships §A chi-square test for homogeneity tests whether the distribution of a categorical variable is the same for each of several populations or treatments. Instead of focusing on the question asked, it’s much easier to look at how the data were produced. üIf the data come from two or more independent random samples or treatment groups in a randomized experiment, then do a chi-square test for homogeneity. üIf the data come from a single random sample, with the individuals classified according to two categorical variables, use a chi-square test for association/independence.
Objectives Conducting a Chi-Square Significance Test! Test for Independence State H 0: Two categorical variables are independent in the population of interest Ha: Two categorical variables are not independent in the population of interest Plan Random: does the data come from 2 independent random samples? Large Sample Size: Expected count must be greater or equal to 5 Independent: 10% condition Do Conclude Based on a P-value of. 001 which is less than α = 0. 05, we reject H 0. We have statistically significant evidence to conclude that anger level and heart disease are not independent in the population of people with normal blood pressure.
Homework 11. 2 (Independence) Homework Worksheet Finish working on Ch. 11 Reading Guide
Section 11. 2 Inference for Relationships In this section, we learned that… ü We can use a two-way table to summarize data on the relationship between two categorical variables. To analyze the data, we first compute percents or proportions that describe the relationship of interest. ü If data are produced using independent random samples from each of several populations of interest or the treatment groups in a randomized comparative experiment, then each observation is classified according to a categorical variable of interest. The null hypothesis is that the distribution of this categorical variable is the same for all the populations or treatments. We use the chi-square test for homogeneity to test this hypothesis. ü If data are produced using a single random sample from a population of interest, then each observation is classified according to two categorical variables. The chi-square test of association/independence tests the null hypothesis that there is no association between the two categorical variables in the population of interest. Another way to state the null hypothesis is H 0: The two categorical variables are independent in the population of interest.
Section 11. 1 Chi-Square Goodness-of-Fit Tests Summary ü The expected count in any cell of a two-way table when H 0 is true is ü The chi-square statistic is where the sum is over all cells in the two-way table. ü The chi-square test compares the value of the statistic χ2 with critical values from the chi-square distribution with df = (number of rows - 1)(number of columns - 1). Large values of χ2 are evidence against H 0, so the P-value is the area under the chi-square density curve to the right of χ2.
Section 11. 1 Chi-Square Goodness-of-Fit Tests Summary ü The chi-square distribution is an approximation to the distribution of the statistic χ2. You can safely use this approximation when all expected cell counts are at least 5 (the Large Sample Size condition). ü Be sure to check that the Random, Large Sample Size, and Independent conditions are met before performing a chi-square test for a two-way table. ü If the test finds a statistically significant result, do a follow-up analysis that compares the observed and expected counts and that looks for the largest components of the chi-square statistic.
- Slides: 20