Causal inference from observational studies emulating a target
Causal inference from observational studies: emulating a target trial Miguel Hernán DEPARTMENTS OF EPIDEMIOLOGY AND BIOSTATISTICS
The situation: We need to make decisions NOW n Treat with A or with B? n Treat now or later? n When to switch to C? o A relevant randomized trial would, in principle, answer each comparative effectiveness and safety question o Interference/scaling up issues aside Hernán - Target trial 2
But we rarely have randomized trials n expensive, untimely, unethical, impractical o And deferring decisions is not an option n no decision is a decision: “Keep status quo” o Question: n What do we do? Hernán - Target trial 3
Answer: We analyze observational data (pre-existing or collected for research) o o o Epidemiologic studies Electronic medical records Administrative claims databases National registers Disease registries Other Hernán - Target trial 4
We analyze observational data n but only because we cannot conduct a randomized trial o Observational studies are not our preferred choice n For each observational study, we can imagine a hypothetical randomized trial that we would prefer to conduct o If only it were possible Hernán - Target trial 5
The Target Trial o An analysis of observational data (e. g. , large health care database) can be viewed as an attempt to emulate a hypothetical pragmatic randomized trial o As suggested more or less explicitly by many authors, including Cochran, Rubin, Feinstein, Dawid, Robins… o Hernán, Robins. Am J Epidemiol 2016 o If the observational analysis succeeds at emulating the target trial, both studies would yield identical effect estimates n except for random variability Hernán - Target trial 6
Procedure to answer clinical/policy questions o Step #1 n Describe the protocol of the target trial o Step #2 n Option A: Conduct the target trial n Option B o Use observational data to explicitlyemulate the target trial o Apply appropriate causal inference analytics to estimate the effects of interest Hernán - Target trial 7
Key elements of target trial protocol Observational study needs to emulate o Eligibility criteria o Strategies n assigned at start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan n followed from start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan Hernán - Target trial 8
Example o Suppose we use observational data n a large health care claims database o to emulate a target trial n of hormone therapy and heart disease o First we need to outline the protocol of the target trial Hernán - Target trial 9
Target trial: Hormone therapy and coronary heart disease Protocol summary Eligibility criteria Postmenopausal women within 5 years of menopause between the years 2005 and 2010, and with no history of cancer and no use of hormone therapy in the last 2 years. Treatment strategies 1. Initiate estrogen plus progestin hormone therapy at baseline and remain on it during the follow-up, unless deep vein thrombosis, pulmonary embolism, myocardial infarction, or cancer are diagnosed 2. Refrain from taking hormone therapy during the follow-up Assignment procedures Participants will be randomly assigned to either strategy at baseline, and will be aware of the strategy they have been assigned to. Follow-up period Starts at randomization and ends at diagnosis of coronary heart disease, death, loss to follow-up, or 5 years after baseline, whichever occurs earlier. Outcome Coronary heart disease diagnosed by a cardiologist Causal contrasts Intention-to-treat effect, per-protocol effect Analysis plan Intention-to-treat analysis, non-naïve per-protocol analysis (to be discussed) Hernán - Target trial 10
Procedure to answer clinical/policy questions o Step #1 n Describe the protocol of the target trial o Step #2 n Option A: Conduct the target trial n Option B o Use observational data to explicitlyemulate the target trial o Apply appropriate causal inference analytics to estimate the effects of interest Hernán - Target trial 11
Target trial emulation requires data experts o When observational data were not collected for research purposes n e. g. , “coronary heart disease” may be recorded when a woman was diagnosed with it, or when her physician suspected it and ordered a diagnostic test o Must consult with knowledgeable data users n Time-varying clinical workflows, idiosyncratic coding practices, software versions… Hernán - Target trial 12
Besides expert knowledge of the data n Validation studies to quantifydata accuracy n Internal consistency checks to detect problems n Cross-datasets comparisons to better understand coding differences o Let’s say we have consulted with experts and done the above before attempting to emulate the target trial Hernán - Target trial 13
Eligibility criteria Emulation o Apply same criteria as the target trial to women who at baseline have been included in the database for at least 2 years o Potential problems n Insufficient data to characterize individuals eligible for the target trial n Example: If target trial required baseline screening to exclude prevalent cases, emulation may be hard if database records the performance of a test (for billing purposes) but not its findings Hernán - Target trial 14
Treatment strategies Emulation o Eligible individuals assigned to the strategy consistent with their baseline data n Strategy 1: women who start estrogen plus progestin therapy n Strategy 2: women who do not start hormone therapy n Excluded: women who start a different hormone therapy o Target trial is typically a pragmatic trial n observational data cannot be used to emulate trials with tight monitoring and enforcement of adherence to the study protocol n cannot emulate a placebo-controlled trial o at most a trial with a “usual care” group Hernán - Target trial 15
Assignment procedures Emulation of blinding o Generally impossible n individuals in the dataset, and their health care workers, are usually aware of the treatment they receive o Observational data can only emulate target trials without blind assignment n standard for pragmatic trials n not a limitation if the goal is comparing real-world treatment strategies Hernán - Target trial 16
Randomized assignment Emulation of randomization o Generally requires adjustment for all confounding factors n via matching, stratification or regression, standardization or inverse probability (IP) weighting, g-estimation… o If insufficient information on baseline confounders or we fail to identify them, then successful emulation of the target trial’s random assignment is not possible n Confounding bias Hernán - Target trial 17
Start of follow-up When is time zero (baseline)? o In true trials n the time of eligibility and randomization o In emulated trials n the time of eligibility and treatment assignment o Failure to assign time zero correctly may lead to misunderstandings n e. g. , immortal time bias, others to be discussed Hernán - Target trial 18
Outcome Emulation o Use the database to identify women with a diagnosis of coronary heart disease during the follow-up o Potential problem: observational data cannot be generally used to emulate a target trial with systematic and blind outcome ascertainment n Except if outcome ascertainment cannot be affected by treatment history, e. g. , if the outcome is mortality independently ascertained from a death registry Hernán - Target trial 19
The observational analysis needs to emulate o Eligibility criteria o Treatment strategies n randomly assigned at start of follow-up o o Start/End of follow-up Outcomes Causal contrast(s) of interest Analysis plan Hernán - Target trial 20
Analysis plan o Identical for true and emulated trials n Except that no adjustment for baseline confounding is expected in intention-to-treat analysis of true trials o Both true and emulated trials require adjustment for post-baseline confounding and selection bias n Possibly using Robins’s g-methods n Which means longitudinal data on treatment, confounders, and outcomes are required Hernán - Target trial 21
The target trial will be a compromise n between the ideal trial we would really like to conduct and the trial we may reasonably emulate using the available data o The drafting of the protocol of the target trial is typically an iterative process n That requires detailed knowledge of the database Hernán - Target trial 22
Examples of trial emulation using Big Data 1. Classic cohort study n n Nurses’ Health Study Postmenopausal hormone therapy and coronary heart disease 2. Electronic medical records - THIN n n Statins and coronary heart disease Static strategies o treat vs. no treat 3. Clinical cohorts – HIV-CAUSAL Collaboration (not today) n n Antiretroviral therapy and mortality Static and dynamic strategies o intervention depends on response to previous intervention Hernán - Target trial 23
EXAMPLE #1 Hormone therapy and heart disease o Question n What is the intention-to-treat effect of hormone therapy on the risk of coronary heart disease in postmenopausal women? Hernán - Target trial 24
Answers (shocking discrepancy) o Observational studies n >30% lower riskin current users compared with never users o e. g. , HR 0. 68 in Nurses Health Study n Grodstein et al. J Women’s Health 2006 o Randomized trial n >20% higher riskin initiators compared with noninitiators o HR 1. 24 in Women’s Health Initiative n Manson et al. NEJM 2003 Hernán - Target trial 25
The WHI randomized trial Manson et al, NEJM 2003 o Double-blind o Placebo-controlled o Large n >16, 000 U. S. women aged 50 -79 yrs o Randomly assigned to estrogen plus progestin therapy or placebo o Women followed approximately every year like in many large observational studies n No intervention after baseline Hernán - Target trial 26
WHI: Effect estimates Intention-to-treat hazard ratio (95% CI) of CHD o Overall o Years of follow-up 1. 23 (0. 99, 1. 53) n 0 -2 n >2 -5 n >5 1. 51 (1. 06, 2. 14) 1. 31 (0. 93, 1. 83) 0. 67 (0. 41, 1. 09) n <10 n 10 -20 n >20 0. 89 (0. 54, 1. 44) 1. 24 (0. 86, 1. 80) 1. 65 (1. 14, 2. 40) o Years since menopause Hernán - Target trial 27
Why did observational studies get it “wrong”? o Popular theory: residual confounding n insufficient adjustment for lifestyle and socioeconomic indicators n Corollary: causal inference from observational data is a hopeless undertaking o An alternative theory: Observational and randomized studies asked different questions Hernán - Target trial 28
Randomized trial estimated the intention-to-treat effect o What is the CHD risk in women who initiate hormone therapy compared with women who do not? o Design and analysis: n Women randomly assigned to initiation of hormone therapy or placebo n Analytic approach o Compare risk between incidentusers and nonusers of hormone therapy Hernán - Target trial 29
Observational studies did not estimate intention-to-treat effect o What is the CHD risk in women who are currently taking hormone therapy compared with women who are not? o Design and analysis: n Women are asked about therapy use n Analytic approach o Compare risk between prevalentusers and nonusers of hormone therapy (current users vs. never users) Hernán - Target trial 30
“Current vs. never users” contrast is not clinically relevant o Consider a woman wondering whether to start hormone therapy n The current vs. never contrast does not provide the information she needs o Consider a woman wondering whether to stop hormone therapy n The current vs. never contrast does not provide the information she needs Hernán - Target trial 31
What if we re-analyze the observational study… o … to compare the risk in incident users vs. nonusers? o That is, what if we use the observational data to answer same question as randomized trial? n estimate the observational analog of the intention-totreat effect o o Hernán et al. Biometrics 2005 Hernán et al. Epidemiology 2008 Hernán - Target trial 32
Effect estimates (ITT hazard ratios) Randomized Observational Women’s Health Initiative Nurses’ Health Study o Overall 1. 23 (0. 99, 1. 53) 1. 05 (0. 82, 1. 34) o Years of follow-up n 0 -2 n >2 1. 51 (1. 06, 2. 14) 1. 07 (0. 81, 1. 41) 1. 43 (0. 92, 2. 23) 0. 91 (0. 72, 1. 16) o Years since menopause n <10 n 10 -20 n >20 0. 89 (0. 54, 1. 44) 1. 24 (0. 86, 1. 80) 1. 65 (1. 14, 2. 40) 0. 88 (0. 63, 1. 21) 1. 13 (0. 85, 1. 49) -- Hernán - Target trial 33
When same questionis asked o No shocking observational-randomized discrepancies for ITT estimates n though wide CIs in both studies o What about the popular hypothesis? Any residual confounding? n Probably, but insufficient to explain the original discrepancy Hernán - Target trial 34
Aside: Analysis can/should be extended in two ways o Causal contrast n Estimate per-protocol effect o Rather than intention-to-treat effect only o Effect measure n Estimate survival (or cumulative risk) curves o Rather than hazard ratios only Hernán - Target trial 35
ITT effect is problematic Hernán, Hernández-Díaz. Clinical Trials 2012 o Depends on adherence patterns n Substantial non-adherence in both randomized trial and observational study o Inappropriate for safety outcomes o Not patient-centered o We also estimated per-protocoleffect n via IP weighting (more later) n Again no randomized-observational discrepancies o Toh et al. Ann Intern Med 2010 Hernán - Target trial 36
End of aside Hernán - Target trial 37
Conclusion: C based on epidemiologic studies is possible o If high-quality observational data on treatment, outcome, and confounders are available n e. g. , the Nurses Health Study o But most observational CER relies on large databases (big, pre-existing data) n Health claims, electronic medical records o Can emulation of a target trial work in that setting? Hernán - Target trial 38
EXAMPLE #2 Statins and coronary heart disease o Question n What is the effect of statin therapy on the risk of coronary heart disease? o Extreme example of confounding Hernán - Target trial 39
Target trial: Statin therapy and coronary heart disease Protocol summary Eligibility criteria Individuals aged 55– 84 in the years 2000 -2006 with no prior history of CHD, stroke, peripheral vascular disease, heart failure, cancer, schizophrenia or dementia, no symptoms of subclinical CHD, and no use of statin therapy in the last 2 years. Treatment strategies 1. Initiate statin therapy at baseline and remain on it during the follow-up, unless contraindications arise 2. Refrain from taking statin therapy during the follow-up Assignment procedures Participants will be randomly assigned to either strategy at baseline, and will be aware of the strategy they have been assigned to. Follow-up period Starts at randomization and ends at diagnosis of coronary heart disease, death, loss to follow-up, or January 2007, whichever occurs earlier. Outcome Coronary heart disease diagnosed by a cardiologist Causal contrasts Intention-to-treat effect, per-protocol effect Analysis plan Intention-to-treat analysis, non-naïve per-protocol analysis Hernán - Target trial 40
Observational data The Health Improvement Network o THIN is a database of electronic medical records n 6. 2 million individuals from 350 general practices in the UK (2009) o For each individual n demographic and socioeconomic characteristics n symptoms, signs and diagnoses, referrals, laboratory test results n some lifestyle information n Vital status and cause of death data Hernán - Target trial 41
Target trial emulation o Use observational data from THIN to emulate the components of the target trial o Eligible individuals classified into n Strategy 1 if they initiated statin treatment n Strategy 2 if they did not initiate statin treatment n during the baseline month o Baseline month January 2000 n 3178 individuals met all the eligibility criteria n 18 were initiators n 1 initiator developed CHD Hernán - Target trial 42
Sequence of target trials Emulation o Emulate a target trial starting each calendar month between January 2000 and November 2006 n 83 target trials with a 1 -month enrollment period o For each trial n Follow-up starts at the trial-specific baseline and ends at diagnosis of CHD, death, lost to follow-up, or January 2007 n Eligibility criteria applied at each baseline Hernán - Target trial 43
Sequence of target trials Emulation o 74 806 individuals eligible for at least 1 trial o On average each eligible individual participated in the emulation of 11 trials n many non-initiators in the January 2000 trial still met all eligibility criteria in February n All 18 initiators in January 2000 were ineligible in February 2000 because they received treatment during the washout period for the February 2000 trial, and so on o Danaei et al. Statistical Methods in Medical Research 2013 Hernán - Target trial 44
CONSORT flowchart of emulated trials Hernán - Target trial 45
Adherence to treatment ___ Non-initiators Probability of continuing initial treatment ___ Initiators Months of follow-up Hernán - Target trial 46
Intention-to-treat effect Emulation o Compare CHD incidence in initiators vs. noinitiators at baseline of each emulated trial n Regardless of their subsequent treatment o Pool data across emulated trials to obtain a more precise effect estimate n Robust variance because of within-subject correlation Hernán - Target trial 47
Hazard ratio (95% CI) of CHD THIN trials 2000 -2006 Intention-to-treat analysis Per-protocol analysis 635 74, 806 6, 335 844, 800 488 74, 806 4, 849 844, 800 Adjusted for age and sex 1. 29 (1. 06, 1. 56) 1. 54 (1. 09, 2. 18) Adjusted for all covariates 0. 89 (0. 73, 1. 09) 0. 84 (0. 54, 1. 30) Unique cases Unique persons Cases Person-trials Hernán - Target trial 48
What if we had compared prevalent (not incident) users vs. nonusers? o Current users n HR: 1. 42 (1. 16, 1. 73) o Persistent (1 yr) current users n HR: 1. 05 o Persistent (2 yrs) current users n HR: 0. 77 (0. 51, 1. 18) v We can get any result we want by changing the definition of current user! n Confounding-selection bias tradeoff Hernán - Target trial 49
Mortality hazard ratio for statins in CHD secondary prevention studies o RCTs: 0. 84 (0. 77, 0. 91) o Observational studies n Incident users: 0. 77 (0. 65, 0. 91) n Prevalent-incident mix: 0. 70 (0. 64, 0. 78) n Prevalent users: 0. 54 (0. 45, 0. 66) o Danaei et al. Am J Epidemiol 2012 Hernán - Target trial 50
Other static comparisons: same analytic approach o Head-to-head comparisons: n Example: Lipophilic statins (atorvastatin, simvastatin) vs other statins n Danaei et al. Diabetes Care 2013 o Joint interventions n Example: statins plus antihypertensives vs standard care n Danaei et al. J Clin Epidemiol 2016 (in press) Hernán - Target trial 51
Advantages of the target trial approach (I) o Provides ready access to the application of formal counterfactual theory and concepts to Big Data n without the need for technical jargon o Organizing principle for causal inference methods n which implicitly rely on counterfactual reasoning n e. g. , new user design, active comparators, outcome controls Hernán - Target trial 52
Advantages of the target trial approach (I) o Provides ready access to the application of formal counterfactual theory and concepts to Big Data n without the need for technical jargon, o Organizing principle for causal inference methods n which implicitly rely on counterfactual reasoning n e. g. , new user design, active comparators, outcome controls Hernán - Target trial 53
Advantages of the target trial approach (II) o Facilitates the comparison of complex strategies that are sustained over time and may depend on a patient’s evolving characteristics n Dynamic treatment strategies n Not “treat vs no treat” but rather “when to treat, when to switch, when to monitor” depending on time-varying factors Hernán - Target trial 54
Advantages of the target trial approach (III) o Establishes a link between methods for the analysis and reporting of randomized trials and Big Data analytics n Observational studies analyzed like randomized trials, and vice versa o Provides a scaffolding to organize discussions about which data are required/missing Hernán - Target trial 55
Advantages of the target trial approach (IV) o Naturally leads to analytic approaches that prevent apparent paradoxes and common biases n n Selection bias related to prevalent users Immortal time bias Birth weight paradox, obesity paradox Etc. Hernán - Target trial 56
Advantages of the target trial approach (V) o Facilitates a systematic methodologic evaluation of observational studies o which components of the target trial we weren’t able to mimic approximately? o which components of the target trial would be problematic even if we were able to conduct a truly randomized trial? n An approach adopted by the Cochrane Collaboration Risk of Bias Tool for Nonrandomised Studies and the IOM Report on the Safety of Approved Drugs Hernán - Target trial 57
Advantages of the target trial approach (VI) o Helps understand why estimates differ across studies n assess the sensitivity of estimates to different design choices for the target trial n focus research efforts on “sensitive” choices Hernán - Target trial 58
Advantages of the target trial approach (last) o If we can influence how data are recorded n the target trial approach helps record them o If we are using data as they exist n the target trial approach guides the validation studies and the development and evolution of the Data Model o The target trial approach allows you to systematically articulate the tradeoffs that you are willing to accept n regarding eligibility criteria, interventions confounders, outcomes Hernán - Target trial 59
A common misinterpretation o You are saying that observational studies are as good as RCTs? n “This is a cohort study that tries to turn itself into a clinical trial. This involves a series of assumptions and manoeuvres which lack credibility. ” o Anonymous JAMA reviewer, April 2014 o No, the point is not that observational studies can turn themselves into randomized experiments n They can’t Hernán - Target trial 60
The point is that we can do better o by using observational data to explicitly emulate randomized trials o The limitations of observational studies (e. g. , confounding, mismeasurement) remain, but we do not compound them with additional problems Hernán - Target trial 61
Remember o Observational studies are what we do when we cannot conduct a randomized trial n In the absence of practical and ethical constraints, sane people will always prefer a randomized trial o No alternative to observational studies n So we better keep improving them o because people will keep using (Big and Small) observational data to guide their decisions Hernán - Target trial 62
- Slides: 62