STAT 250 Dr Kari Lock Morgan Testing Goodnessof

  • Slides: 28
Download presentation
STAT 250 Dr. Kari Lock Morgan Testing Goodness-of. Fit for a Single Categorical Variable

STAT 250 Dr. Kari Lock Morgan Testing Goodness-of. Fit for a Single Categorical Variable SECTION 7. 1 • Testing the distribution of a single categorical variable : 2 goodness of fit (7. 1) Statistics: Unlocking the Power of Data Lock 5

Statistics! �Statistics might be the most important class you take in college http: //college.

Statistics! �Statistics might be the most important class you take in college http: //college. usatoday. com/2015/04/08/voices- statistics-might-be-the-most-important-class-you-takein-college/ (4/8/15) �Why you need to study statistics https: //www. youtube. com/watch? v=w. V 0 Ks 7 a. S 7 YI (4/2/15) Statistics: Unlocking the Power of Data Lock 5

Multiple Categories • So far, we’ve learned how to do inference for categorical variables

Multiple Categories • So far, we’ve learned how to do inference for categorical variables with only two categories • Today, we’ll learn how to do hypothesis tests for categorical variables with multiple categories Statistics: Unlocking the Power of Data Lock 5

Genetic Variants for Fast-Twitch Muscles �A gene called ACTN 3 encodes a protein which

Genetic Variants for Fast-Twitch Muscles �A gene called ACTN 3 encodes a protein which functions in fast twitch muscles �Three different variants of the gene: RR, RX, and XX �In a sample, we observe 130 RR, 226 RX, and 80 XX. �If both R and X are equally likely, then by the Hardy- Weinberg principle about 50% of the population should be heterozygotes (RX) and about 25% should be each of the homozygotes (25% RR, 25% XX) �Do our data contradict these hypothesized proportions? Yang, N. et. al. (2003). “ACTN 3 genotype is associated with human elite athletic performance, ” American Journal of Human Genetics, 73: 627 -631. Statistics: Unlocking the Power of Data Lock 5

Hypothesis Testing 1. State Hypotheses 2. Calculate a statistic, based on your sample data

Hypothesis Testing 1. State Hypotheses 2. Calculate a statistic, based on your sample data 1. Create a distribution of this statistic, as it would be observed if the null hypothesis were true 2. Measure how extreme your test statistic from (2) is, as compared to the distribution generated in (3) Statistics: Unlocking the Power of Data Lock 5

Hypotheses Define the null hypothesized proportions in each category: H 0 : p. RR

Hypotheses Define the null hypothesized proportions in each category: H 0 : p. RR = 0. 25, p. RX = 0. 5, pxx = 0. 25 Ha : At least one pi is not as specified in H 0 Statistics: Unlocking the Power of Data Lock 5

Observed Counts • The observed counts are the actual counts observed in the study

Observed Counts • The observed counts are the actual counts observed in the study Observed RR 130 Statistics: Unlocking the Power of Data RX 226 XX 80 Lock 5

Test Statistic Why can’t we use the familiar formula to get the test statistic?

Test Statistic Why can’t we use the familiar formula to get the test statistic? We need something a bit more complicated… Statistics: Unlocking the Power of Data Lock 5

Expected Counts • The expected counts are the expected counts if the null hypothesis

Expected Counts • The expected counts are the expected counts if the null hypothesis were true • For each cell, the expected count is the sample size (n) times the null proportion, pi Statistics: Unlocking the Power of Data Lock 5

Expected Counts n = 436 Null Proportion Expected RR 0. 25 Statistics: Unlocking the

Expected Counts n = 436 Null Proportion Expected RR 0. 25 Statistics: Unlocking the Power of Data RX 0. 5 XX 0. 25 Lock 5

Chi-Square Statistic Observed Expected RR 130 109 RX 226 218 XX 80 109 �Need

Chi-Square Statistic Observed Expected RR 130 109 RX 226 218 XX 80 109 �Need a way to measure how far the observed counts are from the expected counts… �Use the chi-square statistic : Statistics: Unlocking the Power of Data Lock 5

Chi-Square Statistic Observed Expected RR 130 109 Statistics: Unlocking the Power of Data RX

Chi-Square Statistic Observed Expected RR 130 109 Statistics: Unlocking the Power of Data RX 226 218 XX 80 109 Lock 5

What Next? We have a test statistic. What else do we need to perform

What Next? We have a test statistic. What else do we need to perform the hypothesis test? How do we get this? Two options: 1) Simulation 2) Distributional Theory Statistics: Unlocking the Power of Data Lock 5

Upper-Tail p-value �To calculate the p-value for a chi-square test, we always look in

Upper-Tail p-value �To calculate the p-value for a chi-square test, we always look in the upper tail �Why? Values of the χ2 are always positive The higher the χ2 statistic is, the farther the observed counts are from the expected counts, and the stronger the evidence against the null Statistics: Unlocking the Power of Data Lock 5

Simulation p-value Statistics: Unlocking the Power of Data Lock 5

Simulation p-value Statistics: Unlocking the Power of Data Lock 5

Chi-Square (χ2) Distribution • If each of the expected counts are at least 5,

Chi-Square (χ2) Distribution • If each of the expected counts are at least 5, AND if the null hypothesis is true, then the χ2 statistic follows a χ2 –distribution, with degrees of freedom equal to df = number of categories – 1 • Gene variants: df = 3 – 1 = 2 Statistics: Unlocking the Power of Data Lock 5

Chi-Square Distribution Statistics: Unlocking the Power of Data Lock 5

Chi-Square Distribution Statistics: Unlocking the Power of Data Lock 5

p-value using χ2 distribution Statistics: Unlocking the Power of Data Lock 5

p-value using χ2 distribution Statistics: Unlocking the Power of Data Lock 5

Conclusion Do our data provide evidence that the population proportions differ from 25% RR,

Conclusion Do our data provide evidence that the population proportions differ from 25% RR, 50% RX, and 25% XX? a) Yes b) No Statistics: Unlocking the Power of Data Lock 5

Chi-Square Test for Goodness of Fit 1. State null hypothesized proportions for each category,

Chi-Square Test for Goodness of Fit 1. State null hypothesized proportions for each category, pi. Alternative is that at least one of the proportions is different than specified in the null. 2. Calculate the expected counts for each cell as npi. 3. Calculate the χ2 statistic: 4. Compute the p-value as the proportion above the χ2 statistic for either a randomization distribution or a χ2 distribution with df = (# of categories – 1) if expected counts all > 5 5. Interpret the p-value in context. Statistics: Unlocking the Power of Data Lock 5

Mendel’s Pea Experiment In 1866, Gregor Mendel, the “father of genetics” published the results

Mendel’s Pea Experiment In 1866, Gregor Mendel, the “father of genetics” published the results of his experiments on peas • • He found that his experimental distribution of peas closely matched theoretical distribution predicted by his theory of genetics (involving alleles, and dominant and recessive genes) Source: Mendel, Gregor. (1866). Versuche über Pflanzen-Hybriden. Verh. Naturforsch. Ver. Brünn 4: 3– 47 (in English in 1901, Experiments in Plant Hybridization, J. R. Hortic. Soc. 26: 1– 32) Statistics: Unlocking the Power of Data Lock 5

Mendel’s Pea Experiment Mate SSYY with ssyy: Þ 1 st Generation: all Ss Yy

Mendel’s Pea Experiment Mate SSYY with ssyy: Þ 1 st Generation: all Ss Yy Mate 1 st Generation: => 2 nd Generation Second Generation S, Y: Dominant s, y: Recessive Statistics: Unlocking the Power of Data Phenotype Theoretical Proportion Round, Yellow 9/16 Round, Green 3/16 Wrinkled, Yellow 3/16 Wrinkled, Green 1/16 Lock 5

Mendel’s Pea Experiment Phenotype Round, Yellow Round, Green Wrinkled, Yellow Wrinkled, Green Theoretical Observed

Mendel’s Pea Experiment Phenotype Round, Yellow Round, Green Wrinkled, Yellow Wrinkled, Green Theoretical Observed Proportion Counts 9/16 3/16 1/16 315 101 108 32 Let’s test this data against the null hypothesis of each pi equal to theoretical value, based on genetics Statistics: Unlocking the Power of Data Lock 5

Mendel’s Pea Experiment Phenotype Round, Yellow Round, Green Wrinkled, Yellow Wrinkled, Green Null pi

Mendel’s Pea Experiment Phenotype Round, Yellow Round, Green Wrinkled, Yellow Wrinkled, Green Null pi Observed Counts 9/16 3/16 1/16 Expected Counts 315 101 108 32 The expected count for the round, yellow phenotype is a) 177. 2 b)310. 5 c) 312. 75 d)318. 25 Statistics: Unlocking the Power of Data Lock 5

Mendel’s Pea Experiment Phenotype Round, Yellow Round, Green Wrinkled, Yellow Wrinkled, Green Null pi

Mendel’s Pea Experiment Phenotype Round, Yellow Round, Green Wrinkled, Yellow Wrinkled, Green Null pi Observed Counts 9/16 3/16 1/16 315 101 108 32 Expected Counts 312. 75 104. 25 34. 75 Contribution to χ2 0. 101 0. 135 0. 1218 The contribution to the χ2 statistic for the round, yellow phenotype is a) 0. 012 b)0. 014 c) 0. 016 d)0. 018 Statistics: Unlocking the Power of Data Lock 5

Mendel’s Pea Experiment • χ2 = 0. 47 • Two options: o Simulate a

Mendel’s Pea Experiment • χ2 = 0. 47 • Two options: o Simulate a randomization distribution o Compare to a χ2 distribution with 4 – 1 = 3 df Statistics: Unlocking the Power of Data Lock 5

Mendel’s Pea Experiment p-value = 0. 925 Does this prove Mendel’s theory of genetics?

Mendel’s Pea Experiment p-value = 0. 925 Does this prove Mendel’s theory of genetics? Or at least prove that his theoretical proportions for pea phenotypes were correct? a) Yes b) No Statistics: Unlocking the Power of Data Lock 5

To Do �Read Section 7. 1 �Do HW 7. 1 (due Friday, 4/17) Statistics:

To Do �Read Section 7. 1 �Do HW 7. 1 (due Friday, 4/17) Statistics: Unlocking the Power of Data Lock 5