STAT 250 Dr Kari Lock Morgan Hypothesis Testing

  • Slides: 49
Download presentation
STAT 250 Dr. Kari Lock Morgan Hypothesis Testing: Hypotheses SECTION 4. 1, 4. 2

STAT 250 Dr. Kari Lock Morgan Hypothesis Testing: Hypotheses SECTION 4. 1, 4. 2 • Hypothesis test (4. 1) • Null and alternative hypotheses (4. 1) • Randomization distribution (4. 2) Statistics: Unlocking the Power of Data Lock 5

Review Sessions �GSG Review Session: Tuesday (February 21 st) 6: 00 -8: 00 in

Review Sessions �GSG Review Session: Tuesday (February 21 st) 6: 00 -8: 00 in 108 Chambers �Open Q+A following class today �Office hour schedule here Statistics: Unlocking the Power of Data Lock 5

Question of the Day Does drinking tea boost your immune system? Statistics: Unlocking the

Question of the Day Does drinking tea boost your immune system? Statistics: Unlocking the Power of Data Lock 5

Tea and the Immune System • L-theanine is an amino acid found in tea

Tea and the Immune System • L-theanine is an amino acid found in tea • Black tea: about 20 mg per cup • Green tea (standard): varies, as low as 5 mg per cup • Green tea (shade grown): varies, up to 46 mg per cup (Shade grown green tea examples: Gyokuro, Matcha) �Gamma delta T cells are important for helping the immune system fend off infection �It is thought that L-theanine primes T cells, activating them to a state of readiness and making them better able to respond to future antigens. �Does drinking tea actually boost your immunity? Statistics: Unlocking the Power of Data Lock 5

Tea and Immune Response • Participants were randomized to drink five or six cups

Tea and Immune Response • Participants were randomized to drink five or six cups of either tea (black) or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) • After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured • Explanatory variable: • Response variable: Antigens in tea-Beverage Prime Human Vγ 2 Vδ 2 T Cells in vitro and in vivo for Memory and Nonmemory Antibacterial Cytokine Responses, Kamath et. al. , Proceedings of the National Academy of Sciences, May 13, 2003. Statistics: Unlocking the Power of Data Lock 5

Tea and the Immune System In study comparing tea and coffee and levels of

Tea and the Immune System In study comparing tea and coffee and levels of interferon gamma, if tea drinkers have significantly higher levels of interferon gamma, can we conclude that drinking tea rather than coffee caused an increase in this aspect of the immune response? a) Yes b) No Statistics: Unlocking the Power of Data Lock 5

Tea and Immune System Statistics: Unlocking the Power of Data Lock 5

Tea and Immune System Statistics: Unlocking the Power of Data Lock 5

Tea and Immune System � Statistics: Unlocking the Power of Data Lock 5

Tea and Immune System � Statistics: Unlocking the Power of Data Lock 5

Hypothesis Test �One mean is higher than the other in the sample �Is this

Hypothesis Test �One mean is higher than the other in the sample �Is this difference large enough to conclude the difference is real, and holds for the true population parameters? A hypothesis test uses data from a sample to assess a claim about a population Statistics: Unlocking the Power of Data Lock 5

Hypotheses �Hypothesis tests are framed formally in terms of two competing hypotheses: Null Hypothesis

Hypotheses �Hypothesis tests are framed formally in terms of two competing hypotheses: Null Hypothesis (H 0): Claim that there is no effect or difference. Alternative Hypothesis (Ha): Claim for which we seek evidence. Statistics: Unlocking the Power of Data Lock 5

Tea and Immune Response �Null Hypothesis (H 0): No difference between drinking tea and

Tea and Immune Response �Null Hypothesis (H 0): No difference between drinking tea and coffee regarding interferon gamma No “effect” or no “difference” �Alternative Hypothesis (Ha): Drinking tea increases interferon gamma production more than drinking coffee Claim we seek evidence for Statistics: Unlocking the Power of Data Lock 5

Null Hypothesis µT = true mean interferon gamma after drinking tea µC = true

Null Hypothesis µT = true mean interferon gamma after drinking tea µC = true mean interferon gamma after drinking coffee How would we write the null hypothesis is terms of parameters? a) H 0: µT = µC H 0: µT ≠ µC c) H 0: µT < µC d) H 0: µT > µC b) Statistics: Unlocking the Power of Data Lock 5

Alternative Hypothesis µT = true mean interferon gamma after drinking tea µC = true

Alternative Hypothesis µT = true mean interferon gamma after drinking tea µC = true mean interferon gamma after drinking coffee How would we write the alternative hypothesis is terms of parameters? a) H 0: µT = µC b) H 0: µT ≠ µC c) H 0: µT < µC d) H 0: µT > µC Statistics: Unlocking the Power of Data Lock 5

Difference in Hypotheses �Note: the following two sets of hypotheses are equivalent, and can

Difference in Hypotheses �Note: the following two sets of hypotheses are equivalent, and can be used interchangeably: H 0: µ 1 = µ 2 Ha: µ 1 ≠ µ 2 Statistics: Unlocking the Power of Data H 0: µ 1 – µ 2 = 0 Ha: µ 1 – µ 2 ≠ 0 Lock 5

Hypothesis Helpful Hints �Hypotheses are always about population parameters, not sample statistics �The null

Hypothesis Helpful Hints �Hypotheses are always about population parameters, not sample statistics �The null hypothesis always contains an equality Statistics: Unlocking the Power of Data Lock 5

Statistical Hypotheses Usually the null is a very specific statement ? Alternative Hypothesis Null

Statistical Hypotheses Usually the null is a very specific statement ? Alternative Hypothesis Null Hypothesis Can we reject the null hypothesis? ALL POSSIBILITIES Statistics: Unlocking the Power of Data Lock 5

Null Hypothesis http: //xkcd. com/892/ Statistics: Unlocking the Power of Data Lock 5

Null Hypothesis http: //xkcd. com/892/ Statistics: Unlocking the Power of Data Lock 5

Hypothesis Helpful Hints �Hypotheses are always about population parameters, not sample statistics �The null

Hypothesis Helpful Hints �Hypotheses are always about population parameters, not sample statistics �The null hypothesis always contains an equality �The alternative hypothesis always contains an inequality (<, >, ≠) �The type of inequality in the alternative comes from the wording of the question of interest Statistics: Unlocking the Power of Data Lock 5

Alternative Hypothesis If the researchers were simply comparing tea and coffee, with no a

Alternative Hypothesis If the researchers were simply comparing tea and coffee, with no a priori hypothesis about which would yield a higher immune response, what would the alternative hypothesis be? a) Ha: µT = µC Ha: µT < µC c) Ha: µT > µC d) Ha: µT ≠ µC b) Statistics: Unlocking the Power of Data Lock 5

Two Plausible Explanations �If the sample data support the alternative, there are two plausible

Two Plausible Explanations �If the sample data support the alternative, there are two plausible explanations: 1. The alternative hypothesis (Ha) is true 2. The null hypothesis (H 0) is true, and the sample results were just due to random chance �Key question: Do the data provide enough evidence to rule out #2? �Key answer: What kind of statistics would we see, just by random chance, if the null hypothesis were true? Statistics: Unlocking the Power of Data Lock 5

Actual Experiment R R R R R R 1. Randomize units to treatment groups

Actual Experiment R R R R R R 1. Randomize units to treatment groups Tea R R Coffee R R R Statistics: Unlocking the Power of Data R R R R Lock 5

Actual Experiment 1. Randomize units to treatment groups 2. Conduct experiment 3. Measure response

Actual Experiment 1. Randomize units to treatment groups 2. Conduct experiment 3. Measure response variable Tea R 5 R 13 R 18 R 20 R 11 R 48 R 52 R 55 R 56 R 58 R 47 Statistics: Unlocking the Power of Data Coffee R 0 R 3 R 15 R 11 R 21 R 38 R 52 R 16 Lock 5

Actual Experiment 1. Randomize units to treatment groups 2. Conduct experiment 3. Measure response

Actual Experiment 1. Randomize units to treatment groups 2. Conduct experiment 3. Measure response variable 4. Calculate statistic Tea R 5 R 13 R 18 R 20 R 11 R 48 R 52 R 55 R 56 R 58 R 47 Statistics: Unlocking the Power of Data Coffee R 0 R 3 R 15 R 11 R 21 R 38 R 52 R 16 Lock 5

Actual Experiment 1. Randomize units to treatment groups 2. Conduct experiment 3. Measure response

Actual Experiment 1. Randomize units to treatment groups 2. Conduct experiment 3. Measure response variable 4. Calculate statistic 5. Simulate statistics we could get, just by random chance, if the null hypothesis were true Tea R 5 R 13 R 18 R 20 R 11 R 48 R 52 R 55 R 56 R 58 R 47 Statistics: Unlocking the Power of Data Coffee R 0 R 3 R 15 R 11 R 21 R 38 R 52 R 16 Lock 5

Measuring Evidence against H 0 To see if a statistic provides evidence against H

Measuring Evidence against H 0 To see if a statistic provides evidence against H 0, we need to see what kind of sample statistics we would observe, just by random chance, if H 0 were true Statistics: Unlocking the Power of Data Lock 5

Simulation • “by random chance” means the random assignment to the two treatment groups

Simulation • “by random chance” means the random assignment to the two treatment groups • “if H 0 were true” means that interferon gamma levels would be the same, regardless of whether you drink tea or coffee • To simulate what would happen just by random chance, if H 0 were true… • Re-randomize units to treatment groups, keeping the response values unchanged Statistics: Unlocking the Power of Data Lock 5

Simulation R 0 R 3 R 15 R 11 R 21 R 38 R

Simulation R 0 R 3 R 15 R 11 R 21 R 38 R 52 R 16 R 5 R 13 R 18 R 20 R 11 R 48 R 52 R 55 R 56 R 58 R 47 Tea R 5 R 13 R 18 R 20 R 11 R 48 R 52 R 55 R 56 R 58 R 47 Statistics: Unlocking the Power of Data Coffee R 0 R 3 R 15 R 11 R 21 R 38 R 52 R 16 Lock 5

Simulation 0 0 R 3 11 15 R 21 R 38 R 52 R

Simulation 0 0 R 3 11 15 R 21 R 38 R 52 R 16 R 5 R 13 R 18 R 20 R 11 R 48 R 52 R 55 R 56 R 58 R 47 Tea 1. Re-randomize units to treatment groups R 16 R 21 R 15 R 13 R 18 R 20 R 47 R 55 R 21 Statistics: Unlocking the Power of Data Coffee R 52 R R 38 5 R 48 R 52 R 56 R 58 R 11 Lock 5

Simulation Data Repeat Many Times! Tea 0 R 16 R 21 R 3 15

Simulation Data Repeat Many Times! Tea 0 R 16 R 21 R 3 15 R 13 R 18 R 20 R 47 R 55 R 21 1. Re-randomize units to treatment groups 2. Calculate statistic: Statistics: Unlocking the Power of Data Coffee R 0 R 38 R 52 R R 11 5 R 48 R 52 R 56 R 58 R 11 Lock 5

Distribution of Statistic Under H 0 Statistics: Unlocking the Power of Data Lock 5

Distribution of Statistic Under H 0 Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution A randomization distribution is a collection of statistics from samples simulated assuming

Randomization Distribution A randomization distribution is a collection of statistics from samples simulated assuming the null hypothesis is true �The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true Statistics: Unlocking the Power of Data Lock 5

Hypothesis Testing 1. What kinds of statistics might we see, just by random chance,

Hypothesis Testing 1. What kinds of statistics might we see, just by random chance, if the null hypothesis were true? [Randomization Distribution] 2. How extreme is our original statistic? [formalize next Monday] 3. Is it “extreme enough” to reject the null? [formalize next Wednesday] Statistics: Unlocking the Power of Data Lock 5

Distribution of Statistic Under H 0 How extreme is the observed statistic? ? ?

Distribution of Statistic Under H 0 How extreme is the observed statistic? ? ? Is the null hypothesis a plausible explanation? Statistics: Unlocking the Power of Data Lock 5

Green Tea and Prostate Cancer � A study was conducted on 60 men with

Green Tea and Prostate Cancer � A study was conducted on 60 men with PIN lesions, some of which turn into prostate cancer � Half of these men were randomized to take 600 mg of green tea extract daily, while the other half were given a placebo pill � The study was double-blind, neither the participants nor the doctors knew who was actually receiving green tea � After one year, only 1 person taking green tea had gotten cancer, while 9 taking the placebo had gotten cancer Statistics: Unlocking the Power of Data Lock 5

Green Tea and Prostate Cancer In the study about green tea and prostate cancer,

Green Tea and Prostate Cancer In the study about green tea and prostate cancer, if the difference is statistically significant, could we conclude that green tea really does help prevent prostate cancer? (a) Yes (b) No Statistics: Unlocking the Power of Data Lock 5

Green Tea and Prostate Cancer The explanatory variable is green tea extract of placebo,

Green Tea and Prostate Cancer The explanatory variable is green tea extract of placebo, the response variable is whether or not the person developed prostate cancer. What statistic and parameter is most relevant? a) Mean b) Proportion c) Difference in means d) Difference in proportions e) Correlation Statistics: Unlocking the Power of Data Lock 5

Green Tea and Prostate Cancer p 1 = proportion of green tea consumers to

Green Tea and Prostate Cancer p 1 = proportion of green tea consumers to get prostate cancer p 2 = proportion of placebo consumers to get prostate cancer State the null hypotheses. a) H 0: p 1 = p 2 b) H 0: p 1 < p 2 H 0: p 1 > p 2 d) H 0: p 1 ≠ p 2 c) Statistics: Unlocking the Power of Data Lock 5

Green Tea and Prostate Cancer p 1 = proportion of green tea consumers to

Green Tea and Prostate Cancer p 1 = proportion of green tea consumers to get prostate cancer p 2 = proportion of placebo consumers to get prostate cancer State the alternative hypotheses. a) H 0: p 1 = p 2 b) H 0: p 1 < p 2 H 0: p 1 > p 2 d) H 0: p 1 ≠ p 2 c) Statistics: Unlocking the Power of Data Lock 5

Randomization Test 1. State hypotheses 2. Collect data 3. Calculate statistic: 4. Simulate statistics

Randomization Test 1. State hypotheses 2. Collect data 3. Calculate statistic: 4. Simulate statistics that could be observed, just by random chance, if the null hypothesis were true (create a randomization distribution) 5. How extreme is the observed statistic? 6. Is the null hypothesis (random chance) a plausible explanation? Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution Based on the randomization distribution, a) Yes would the observed statistic be

Randomization Distribution Based on the randomization distribution, a) Yes would the observed statistic be extreme if b) No the null hypothesis were true? Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution Do you think the null hypothesis is a a) Yes plausible explanation

Randomization Distribution Do you think the null hypothesis is a a) Yes plausible explanation for these results? b) No Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) 10. 2 b) 12 c) 45 d) 1. 8 Statistics: Unlocking

Randomization Distribution a) 10. 2 b) 12 c) 45 d) 1. 8 Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution Center �A randomization distribution simulates samples assuming the null hypothesis is true,

Randomization Distribution Center �A randomization distribution simulates samples assuming the null hypothesis is true, so A randomization distribution is centered at the value of the parameter given in the null hypothesis. Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) b) c) d) e) How extreme 10. 2 is How extreme

Randomization Distribution a) b) c) d) e) How extreme 10. 2 is How extreme 12 is How extreme 45 is What the standard error is How many randomization samples we collected Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) 0 b) 1 c) 21 d) 26 e) 5 Statistics: Unlocking

Randomization Distribution a) 0 b) 1 c) 21 d) 26 e) 5 Statistics: Unlocking the Power of Data Lock 5

Randomization Distribution a) The standard error b) The center point c) How extreme 26

Randomization Distribution a) The standard error b) The center point c) How extreme 26 is d) How extreme 21 is e) How extreme 5 is Statistics: Unlocking the Power of Data Lock 5

To Do �Study for EXAM 1! �HW 3. 3 and 3. 4 due Wednesday,

To Do �Study for EXAM 1! �HW 3. 3 and 3. 4 due Wednesday, 2/22 �HW 4. 1. 4. 2 due Wednesday, 3/1 Statistics: Unlocking the Power of Data Lock 5