Introduction to Causal Inference Methods Miguel Hernn DEPARTMENTS






















































- Slides: 54
Introduction to Causal Inference Methods Miguel Hernán DEPARTMENTS OF EPIDEMIOLOGY AND BIOSTATISTICS
A way to clarify causal inference analyses o Separate the scientific from the technical o The scientific is the most important part, the technical is just a recipe Hernán - Causal Inference Methods 2
Separating the scientific from the technical in causal inference 1. Formulation of questions n Specification of target trial 2. Identification of data requirements n To emulate target trial 3. Choice of statistical methods n Same as for target trial plus adjustment for baseline confounding o 1 and 2 are scientific, 3 is technical Hernán - Causal Inference Methods 3
1. Formulation of questions o Well-defined causal inference questions can be mapped into a target trial o First step in causal inference: n Specify the protocol of the target trial o As suggested more or less explicitly by many authors, including Cochran, Rubin, Feinstein, Robins… o Hernán, Robins. Am J Epidemiol 2016 Hernán - Causal Inference Methods 4
Key elements of target trial protocol Observational study needs to emulate o Eligibility criteria o Strategies n assigned at start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan n followed from start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan Hernán - Causal Inference Methods 5
Key elements of target trial protocol Observational study needs to emulate o Eligibility criteria o Strategies n assigned at start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan n followed from start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan Hernán - Causal Inference Methods 6
Classification of treatment strategies according to their time course o Point interventions n Intervention occurs at a single time n Examples: one-dose vaccination, short-lived traumatic event, surgery… o Intention-to-treat effects in RCTs are about point interventions o Sustainedstrategies n Interventions occur at several times n Examples: medical treatments, lifestyle, environmental exposures… n Many (most? ) questions are about sustained exposures Hernán - Causal Inference Methods 7
Classification of sustainedtreatment strategies o Static n a fixed strategy for everyone n Example: treat with 150 mg of daily aspirin during 5 years o Dynamic n a strategy that assigns different values to different individuals as a function of their evolving characteristics n Example: start aspirin treatment if coronary heart disease, stop if stroke Hernán - Causal Inference Methods 8
An advantage of explicit emulation of a target trial o Clarity n Well-defined interventions lead to well-defined questions n From which everything else follows o Identification of ill-defined questions n Questions for which a target trial would be too vague o Effect of excess body weight o Direct effect of statins not mediated though cholesterol Hernán - Causal Inference Methods 9
No intervention is perfectly well-defined Robins and Greenland (2000), Hernán et al (2008, 2011) o There is a spectrum n Aspirin relatively well-defined, HDL-cholesterol not so much, physical exercise somewhere in the middle… n The goal is to reduce ambiguity, not to rejoice in it o Statistical methods are agnostic to the spectrum n We feed them data, we get a number o a so-called “effect” estimate n The challenge is how to interpret that number Hernán - Causal Inference Methods 10
Separating the scientific from the technical in causal inference problems 1. Formulation of questions n Specification of target trial 2. Identification of data requirements n To emulate target trial 3. Choice of statistical methods n The technical part Hernán - Causal Inference Methods 11
2. Identification of data requirements o Observational analyses need the data required to emulate each component of the target trial n No more, no less o Emulation not straightforward Hernán - Causal Inference Methods 12
Key elements of target trial protocol Observational study needs to emulate o Eligibility criteria o Strategies n assigned at start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan n followed from start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan Hernán - Causal Inference Methods 13
Target trial: Hormone therapy and coronary heart disease Protocol summary Eligibility criteria Postmenopausal women within 5 years of menopausebetween the years 2005 and 2010, and with no history of cancerand no use of hormone therapyin the last 2 years. Treatment strategies 1. Initiate estrogen plus progestin hormone therapy at baseline and remain on it during the follow-up, unless deep vein thrombosis, pulmonary embolism, myocardial infarction , or cancer are diagnosed 2. Refrain from taking hormone therapy during the follow-up Assignment procedures Participants will be randomly assigned to either strategy at baseline, and will be aware of the strategy they have been assigned to. Follow-up period Starts at randomization and ends at diagnosis of coronary heart disease, death, loss to follow-up , or 5 years after baseline, whichever occurs earlier Outcome Coronaryheart diseasediagnosed by a cardiologist Causal contrasts Intention-to-treat effect, per-protocol effect Analysis plan Intention-to-treat analysis, non-naïve per-protocol analysis (to be discussed) Hernán - Causal Inference Methods 14
Target trial: Statin therapy and coronary heart disease Protocol summary Eligibility criteria Individuals aged 55– 84 in the years 2000 -2006 with no prior history of CHD, stroke, peripheral vascular disease, heart failure, cancer, schizophrenia or dementia, no symptoms of subclinical CHD, and no use of statin therapy in the last 2 years Treatment strategies 1. Initiate statin therapyat baseline and remain on it during the follow-up, unless contraindications arise 2. Refrain from taking statin therapy during the follow-up Assignment procedures Participants will be randomly assigned to either strategy at baseline, and will be aware of the strategy they have been assigned to. Follow-up period Starts at randomization and ends at diagnosis of coronary heart disease, death, loss to follow-up, or January 2007, whichever occurs earlier. Outcome Coronaryheart diseasediagnosed by a cardiologist Causal contrasts Intention-to-treat effect, per-protocol effect Analysis plan Intention-to-treat analysis, non-naïve per-protocol analysis Hernán - Causal Inference Methods 15
Key elements of target trial protocol Observational study needs to emulate o Eligibility criteria o Strategies n assigned at start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan n followed from start of follow-up o o o Randomized assignment Start/End follow-up Outcomes Causal contrast(s) of interest Analysis plan Hernán - Causal Inference Methods 16
Causal inference methods are methods that emulate randomization o Causal inference methods are methods to reduce confounding and selection bias o Under certain assumptions n Assumptions are just ways of filling in the data that we don’t have to emulate randomization in the target trial Hernán - Causal Inference Methods 17
Two approaches to emulate randomization (Two approaches to adjust for confounding) 1. Correctly measure and appropriately adjust for all confounders n Stratification/regression, matching, propensity scores n G-methods: standardization/g-formula, g-estimation, IP weighting 2. Exploit sources of randomness in the data to adjust for confounding without measuring the confounders n instrumental variable estimation, regression discontinuity, etc. Hernán - Causal Inference Methods 18
Separating the scientific from the technical in causal inference problems 1. Formulation of questions n Specification of target trial 2. Identification of data requirements n To emulate target trial 3. Choice of statistical methods n To analyze the data Hernán - Causal Inference Methods 19
3. Statistical methods o Causal inference often associated with fancy, and unfamiliar, methods n e. g. , inverse probability weighting of marginal structural models o But statistical methods are only the last step in the process of causal inference n and often no fancy methods are needed n all methods are causal under certain assumptions Hernán - Causal Inference Methods 20
Choice of method depends on type of strategies o Comparison of strategies involving point interventions only n All methods work n if all confounders are measured or the instrumental variable conditions hold o Comparison of sustained strategies n Generally only g-methods work n Developed by Robins and collaborators since 1986 Hernán - Causal Inference Methods 21
Comparative effects of point interventions o Time-fixed treatment implies time-fixed (i. e. , baseline) confounding o Any adjustment method will correctly adjust for measured baseline confounders n e. g. , outcome regression such as logistic or Cox regression Hernán - Causal Inference Methods 22
Comparative effect of sustained strategies o Time-varying treatments imply time-varying confounders n possible treatment-confounder feedback o Conventional methods may introduce bias even when sufficient data are available on n Time-varying treatments and time-varying confounders o G-methods can appropriately handle treatmentconfounder feedback n Sometimes referred to as “causal” methods Hernán - Causal Inference Methods 23
Treatment-confounder feedback A 0 U L 1 A 1 Y At: Antiretroviral therapy Y: Outcome Lt: CD 4 cell count U: Immunologic status o There is treatment-confounder feedback if the time -varying confounders are affected by previous treatment n Confounder on the causal pathway NOT necessary for bias Hernán - Causal Inference Methods 24
G-methods o Parametric g-formula n Robins 1986 o G-estimation of nested structural models n Robins 1989, 1991 o Inverse probability weighting of marginal structural models n Robins 1998 o Doubly-robust versions n Robins, Vanderlaan, Rotnitzky… n e. g. , collaborative targeted maximum likelihood estimation Hernán - Causal Inference Methods 25
G-methods o Parametric g-formula n Robins 1986 o G-estimation of nested structural models n Robins 1989, 1991 o Inverse probability weighting of marginal structural models n Robins 1998 o Doubly-robust versions n Robins, Vanderlaan, Rotnitzky… n e. g. , collaborative targeted maximum likelihood estimation Hernán - Causal Inference Methods 26
In summary, if we want to use observational data to support decision making o First, we need to know what question is being asked exactly n Specify protocol of the target trial o Second, we need to describe how we are going to emulate the target trial o Last, we can discuss the analysis of the observational data Hernán - Causal Inference Methods 27
Examples of target trial emulation using observational data 1. Point intervention (randomized trial) n Colorectal cancer screening and death n NORCCAP – a randomized trial in Norway n Instrumental variable estimation 2. Sustained strategy (claims database) n Epoetin and mortality n USRDS Medicare n G-methods Hernán - Causal Inference Methods 28
EXAMPLE #1 Colorectal cancer screening Question: o What is the effect of one-time sigmoidoscopy on the risk of colorectal cancer, colorectal cancer mortality, and all-cause mortality? Hernán - Causal Inference Methods 29
o ~100, 000 individuals randomly assigned to screening (~20%) or control (~80%) groups o 10 -year risk difference from intention-to-treat analysis was n − 0. 22% (− 0. 38% to − 0. 06%) for colorectal cancer n − 0. 06% (− 0. 14% to 0. 03%) for colorectal cancer death n − 0. 22% (− 0. 65% to 0. 22%) for all deaths Hernán - Causal Inference Methods 30
37% of individuals in the screening group did not undergo screening o Intention-to-treat effect far from per-protocol effect o Intention-to-treat effect not patient-centered n Patients planning to undergo screening want to know the effect of screening without contamination from those who rejected screening o To estimate per-protocol effect, how about a naïve per-protocol analysis? n Compare outcomes between compliers in screening group and everyone in control group Hernán - Causal Inference Methods 31
Naïve per-protocol analysis biased for some outcomes Hernán - Causal Inference Methods 32
Adjusted per-protocol analysis failed to remove the differences in mortality o Ideal setting for instrumental variable analysis Hernán - Causal Inference Methods 33
Bounds for the per-protocol effect 10 -year risk difference (age-standardized) a. b. c. d. e. f. g. Hernán - Causal Inference Methods no assumptions instrumental conditions only + assumed maximum risk under screening in “never-takers” of 2%, 1%, and 40% for the CRC incidence, CRC mortality, and all-cause mortality + assumed maximum risk under screening in the “never-takers” of 1. 5%, 0. 75%, and 30% for the CRC incidence, CRC mortality, and allcause mortality, respectively + assumed maximum risk under screening in “never-takers” of 1%, 0. 5%, and 20% for the CRC incidence, CRC mortality, and allcause mortality, respectively + additive effect homogeneity + multiplicative effect homogeneity 34
Instrumental variable analysis under alternative assumptions (monotonicity) 10 -year risk differences Hernán - Causal Inference Methods 35
The role of instrumental variable estimation in causal inference o Well suited to estimate per-protocol effects in randomized trials with one-time treatments and allor-nothing compliance n Underutilized method in this setting o Conditions for validity often questionable in observational studies n True for all other methods too o Generally not appropriate when comparing sustained interventions Hernán - Causal Inference Methods 36
EXAMPLE #2 Epoetin dosing and mortality o Question: What is the effect of different doses of epoetin therapy on the mortality risk of patients undergoing hemodialysis? o Data: US Renal Data System (Medicare claims database) n ~18, 000 eligible elderly patients n Zhang et al. CJASN 2009; 21: 638 -644 Hernán - Causal Inference Methods 37
The target trial o Eligibility criteria n End-stage renal disease o Strategies n Fixed weekly dose of intravenous epoetin n 15, 000, 30, 000, or 45, 000 units o Follow-up n From 3 months after hemodialysis onset until death, loss to followup or administrative end of the study (1 year) o Outcome n All-cause mortality o … Hernán - Causal Inference Methods 38
Methodological challenge o Time-varying treatment n Use and dose of epoetin varies over the course of the disease o Time-varying confounders n Hematocrit level, comorbidities n may be affected by prior treatment o Treatment-confounder feedback n Need “causal” g-methods n e. g. , IP weighting of marginal structural models Hernán - Causal Inference Methods 39
Treatment-confounder feedback A 0 U L 1 A 1 Y At: Epoetin dose Y: Death Lt: Hematocrit U: Disease severity o There is treatment-confounder feedback because the time-varying confounder is affected by previous treatment Hernán - Causal Inference Methods 40
Survival under 3 epoetin dosing regimes Zhang et al. CJASN 2009; 21: 638 -644 Hernán - Causal Inference Methods 41
But this is a silly target trial o In clinical practice, patients do not receive a fixed weekly dose of epoetin n That would be clinical malpractice o Rather, actual clinical strategies are dynamic n A patient’s weekly dose depends on her hemoglobin or hematocrit, which in turn depends on her prior weekly dose Hernán - Causal Inference Methods 42
More reasonable strategies for a target trial 1. Mid-Hematocrit strategy n epoetin to maintain Hct between 34. 5% and 39. 0% 2. Low-Hematocrit strategy n epoetin to maintain Hct between 30. 0% and 34. 5%. o Under both strategies, epoetin dose is n increased by >10% if previous Hct below target n decreased by <10% times [previous Hct minus lower end of range] or increased by <10% times [upper end of range minus Hct] if Hct within target n decreased by >25% if Hct above target Hernán - Causal Inference Methods 43
More reasonable strategies imply more work o Need to specify a more detailed protocol for the target trial o Need to specify how to emulate that protocol n Appropriate adjustment for time-varying confounders becomes critical n Zhang et al. Medical Care 2014 Hernán - Causal Inference Methods 44
Hernán - Causal Inference Methods 45
Hernán - Causal Inference Methods 46
Hernán - Causal Inference Methods 47
Survival under these 2 dynamic strategies Hernán - Causal Inference Methods 48
Fundamental problem o We never know whether we successfully adjusted for confounding n We never know whether our causal estimates are valid o Need to be cautious and conduct sensitivity analyses n Including the use of negative controls o Also need to better criticize observational analyses n and separate the scientific from the technical Hernán - Causal Inference Methods 49
o Among individuals assigned to placebo, the 5 -year mortality risk was higher among those who did not adhere than among those who did adhere to the placebo pills o This finding is taught in courses around the world n often quoted as a reminder of the dangers of analyses that deviate from the intention-to-treat principle. n chilling effect on subsequent attempts to conduct per-protocol (observational) analyses in trials Hernán - Causal Inference Methods 50
A 21 st century update of the CDP analysis Hernán - Causal Inference Methods 51
Difference in 5 -year mortality between adherers and nonadherers to placebo o Replication of 1980 analysis n Unadjusted: 14. 3% (95% CI 10. 8 to 17. 8) n Adjusted: 10. 9% (95% CI 7. 5 to 14. 4) o for baseline variables only o 2015 update n Unadjusted: 11. 0% (95% CI 6. 5. to 15. 6) n Adjusted: 2. 5% (95% CI -2. 1 to 7. 0) o for baseline and time-varying variables Hernán - Causal Inference Methods 52
So there is some hope o CDP trial, hormone therapy and heart disease, antiretroviral therapy (Lodi’s talk)… o Sound application of causal inference methods n Explicit emulation of target trial n Appropriate choice of statistical methods often leads to reasonable causal inferences Hernán - Causal Inference Methods 53
If you are ever confused at a “causal inference” talk. . . ask the speaker to o separate the scientific part n Specification of the target trial n Description of the trial emulation o from the technical part n Statistical methods Hernán - Causal Inference Methods 54