EPI546 Block I Lecture 4 Statistics Hypothesis Testing

EPI-546 Block I Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Roger J. Lewis, MD, Ph. D Department of Emergency Medicine Harbor-UCLA Medical Center and Jeff Jones, Grand Rapids MERC / MSU Program in Emergency Medicine

Today’s Topics n Classical Hypothesis Testing n n Type I Error Type II Error, Power, Sample Size Point Estimates and Confidence Intervals Multiple Comparisons 2

Classical Hypothesis Testing: Steps 1. 2. 3. 4. 5. Define the null hypothesis Define the alternative hypothesis Calculate a p value Accept or reject the null hypothesis based on the p value If the null hypothesis is rejected, then accept the alternative hypothesis Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 3

Classical Hypothesis Testing: • The Null Hypotheses: no difference between the two groups to be compared Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 4

Classical Hypothesis Testing: • The Alternative Hypothesis: there is a difference between the two groups to be compared Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 5

Classical Hypothesis Testing: Defining the Alternative Hypothesis • • • The size of the expected difference should be defined prior to data collection (a priori) The difference defined by the alternative hypothesis should be clinically significant Example: Difference in Pain Score on 100 mm VAS of 13 mm or greater Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 6

Classical Hypothesis Testing: • The p value: probability of obtaining the results observed, if the null hypothesis were true Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 7

Classical Hypothesis Testing: p value • If p = 0. 01, then the chance of obtaining the same results as the experiment is 1% • • Very unlikely due to chance! So we reject the null hypothesis Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 8

Classical Hypothesis Testing: p value • If p = 0. 01, then the chance of obtaining the same results as the experiment is 1% • • • Very unlikely due to chance! So we reject the null hypothesis If p = 0. 7, then the chance of obtaining the same results as the experiment is 70% • accept the null hypothesis Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 9

Classical Hypothesis Testing: Rejecting the Null Hypothesis • • • The cut-point for rejecting the null hypothesis is arbitrary (a) Typically, a = 0. 05 If the null hypothesis is rejected, then the alternative hypothesis is accepted as true Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 10

Clinical Trial (statistical testing) Jury Trial (criminal law) Assume the null hypothesis Presume innocent Goal: detect a true difference (reject the null hypothesis) Goal: convict the guilty “Level of significance” p <. 05 “Beyond reasonable doubt” Requires: adequate sample size Requires: convincing testimony Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Similar to a Trial by Jury…. . • There are only 1 of 4 possible outcomes of a Clinical Trial: • • 2 are correct: TP, TN 2 are errors: FP, FN Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 12

TRUTH Guilty REJECT Ho (P < 0. 05) Innocent TP FP FN TN SIGNF. TEST ACCEPT Ho (P > 0. 05) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Clinical Trial (statistical testing) Jury Trial (criminal law) Appropriately reject the null hypothesis (TP) Correct verdict: convict a guilty person Appropriately accept the null hypothesis (TN) Correct verdict: acquit the innocent Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Clinical Trial (statistical testing) Jury Trial (criminal law) Correct inference: reject the null hypothesis Correct verdict: convict a guilty person Correct inference: accept the null hypothesis Correct verdict: acquit the innocent Incorrect inference (FP) Type I error Incorrect verdict: hang innocent person Incorrect inference (FN) Type II error Incorrect verdict: guilty skates free Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Errors TRUTH Guilty REJECT Ho (P < 0. 05) TP FP Type I (alpha) SIGNF. TEST ACCEPT Ho (P > 0. 05) Innocent FN TN Type II (Beta) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Classical Hypothesis Testing: Type II Error • A false-negative result • p value >. 05 is obtained, yet the two • groups are different The risk of a type II error = b Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 17

Type II Error • • Although trend toward benefit, p value >. 05 Null hypothesis accepted • Truth: larger study demonstrated that the two groups were actually different Committed a Type II Error Typical pilot study has low power to detect a difference Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 18

Classical Hypothesis Testing: Power • • Power = 1 - b If Power 80%: • 80% probability of detecting a true difference if it exists Power is determined by sample size, the magnitude of the difference sought, and by a Pilot study had small sample size, therefore “low” power Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 19

Steps in Sample Size Determination 1. Define the type of data (continuous, ordinal, categorical, etc. ) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 20

A Few Examples of Statistical Tests Test Comparison Principal Assumptions Student's t test Means of two groups Continuous variable, normally distributed, equal variance Wilcoxon rank sum Medians of two groups Continuous variable Chi-square Proportions Categorical variable, more than 5 patients in any particular "cell" Fisher's exact Proportions Categorical variable Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 21

Steps in Sample Size Determination 1. 2. 3. 4. 5. Define the type of data (continuous, ordinal, categorical, etc. ) Define the size of the difference sought Define a (usually 0. 05) Determine power desired (often 0. 80) Look up the sample size: tables, formulas or software Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 22

Today’s Topics n Classical Hypothesis Testing n n Type I Error Type II Error, Power, Sample Size Point Estimates and Confidence Intervals Multiple Comparisons Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 23

Limitations of the p Value n n p < 0. 05 tells us that the observed treatment difference is “statistically significantly” different p < 0. 05 does not tell us: n n The uncertainty around the point estimate The likelihood that the true treatment effect is clinically important Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 24

Confidence Intervals: Example n Purpose: to compare the effects of vasopressor A (VA) and vasopressor B (VB) based on posttreatment SBP in hypotensive patients Endpoint: post-treatment SBP n Null hypothesis: mean SBPA = mean SBPB n Results: mean SBPA = 70 mm Hg (after VA) mean SBPB = 95 mm Hg (after VB) Observed difference = 25 mm Hg (p < 0. 05) 25 mm Hg difference is the “point estimate” n Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 25

The Point Estimate and the CI n When using CIs, we report the point estimate and the limits of the CI surrounding the point estimate: 25 mm Hg (95% CI: 5 to 44 mm Hg) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 26

Interpretation of the CI n n Consider the comparison of vasopressor A and vasopressor B Since the 95% CI, 5 to 44 mm Hg doesn’t include 0, this is equivalent to p < 0. 05 5 25 44 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 27

Interpretation of the CI n Although the point estimate for the difference is 25 mm Hg, the results are consistent with the true difference being anywhere between 5 and 44 mm Hg 5 25 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 44 28

Why a 95% CI? n The selection of 95% CIs (as opposed to 99% CIs, for example) is arbitrary n like the selection of 0. 05 as the cutoff for a statistically significant p value Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 29

Middle Ear Squeeze Study n n For a power of 80%, we needed a sample size of approximately 120 subjects N = 116 n n 60 treatment 56 control Ann Emerg Med July 1992; 21: 849 -852. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 30

Middle Ear Squeeze Study Using p value n n For a power of 80%, we needed a sample size of approximately 120 subjects N = 116 n n n Outcome - ear discomfort: n n n 60 treatment 56 control Treatment group 8% Control group 32% p =. 001 n Sudafed works! Ann Emerg Med July 1992; 21: 849 -852. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 31

Middle Ear Squeeze Study Using Point Estimate and 95% CI n Ear discomfort: n n Treatment group 8% Control group 32% Absolute Risk Reduction 24% (95% CI: 9. 9 to 38. 3%) NNT 4. 2 (95% CI: 2. 6 to 10. 1) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 32

Cochrane Library. Wood-Baker, RR; Gibson, PG; Hannay, M; Walters, EH; Walters, JAE Date of Most Recent Update: 26 -July-2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Clinical vs. Statistical Significance n n n Oral ondansetron vs. placebo 215 children with gastroenteritis Primary outcome: vomiting during oral hydration n RR = 0. 4 (95% CI: 0. 26 to 0. 61) NNT = 4. 9 (95% CI: 3. 1 to 10. 3) Both clinically significant and statistically significant N Engl J Med 2006; 354: 1698 -705 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 34

Clinical vs. Statistical Significance n Secondary outcome: oral intake in ED n n n 239 ml vs. 196 ml p = 0. 001 (statistically significant) But is a difference of 9 tsp clinically significant? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 35

Today’s Topics n Classical Hypothesis Testing n n Type I Error Type II Error, Power, Sample Size Point Estimates and Confidence Intervals Multiple Comparisons 36

Multiple Comparisons • • • When two identical groups of patients are compared, there is a chance (a) that a statistically significant p value will be obtained (type I error) When multiple comparisons are performed, the risk of one or more false-positive p values is increases Multiple comparisons include: – – – Pair-wise comparisons of more than two groups The comparison of multiple characteristics between two groups (e. g. , sub-group analyses) The comparison of two groups at multiple time points Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 37

Multiple Comparisons: Risk of 1 False Positive Number of Comparisons Probability of at Least One Type I Error 1 2 3 4 5 10 20 30 0. 05 0. 10 0. 14 0. 19 0. 23 0. 40 0. 64 0. 79 Assumes a= 0. 05, uncorrelated comparisons Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 38

Multiple Comparisons: Bonferroni Correction • • • A method for reducing the overall risk of a type I error when making multiple comparisons The overall (study-wise) type I error risk desired (e. g. , 0. 05) is divided by the number of tests, and this new value is used as the a for each individual test Controls the type I error risk, but reduces the power (increased type II error risk) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 39

Results: We tested these 24 associations in the independent validation cohort. Residents born under Leo had a higher probability of gastrointestinal hemorrhage (P =. 04), while Sagittarians had a higher probability of humerus fracture (P =. 01) compared to all other signs combined. After adjusting the significance level to account for multiple comparisons, none of the identified associations remained significant in either the derivation or validation cohort. Bonferroni correction: . 05/24 = 0. 002 for statistical significance Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ.

Statistical Issues to Consider if Planning a Study • • • Define the most important question to be answered – the “primary objective” Define the size of the difference you wish to detect Get as much information as possible about what you expect to see in the control group Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 41

Statistical Issues to Consider if Planning a Study • • • Define values for a and power, and the maximum sample size that is realistic Define clinically important subgroups of the population (a priori sub-group analyses) Determine whethere are important multiple comparisons Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 42

When You Visit the Statistician: • Bring examples of published studies that illustrate the type of analysis you would like to perform at the end of the study Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 43