Translational Data Science L J Wei Harvard University

Many thanks to � Lu Tian, Stanford � Tianxi Cai, Harvard � Brian Claggett,

What is the goal of a clinical study? �To obtain robust, clinically interpretable treatment

What are the issues? �The conventional way to conduct trials gives us fragmentary information

A Few Methodology Issues 1. Estimation vs. testing �P-value provides little clinical information about

TREAT study for EPO CV safety �If we follow the patients up to 48

What is a clinically meaningful treatment effect via estimation? �Reimbursement issue beyond getting the

2. How do we define a primary endpoint with multiple outcomes? �What is current

What is the general clinical practice for treating a patient with cardiovascular diseases? �Following

A typical cardiovascular(CV) study �Comparing a new therapy with standard care �Question is whether

Conventional approaches for clinical trials � Choosing a single outcome (e. g. ,

Example : Beta-Blocker Evaluation of Survival (BEST) Trial (NEJM, 2001) �Study �Bucindolol vs. placebo

Possible solutions? �Using the patient’s disease burden or progression information during the entire followup

BEST Example: 8 Categories � 1: No events � 2: Alive, non-HF hospitalization only

Example: Treatment for HIV infected children �Primary endpoint: viral load reduction �Major secondary endpoint:

Example: DMD rare disease �Nonsense mutation Duchenne muscular dystrophy (nm. DMD) is a rare,

Ambulatory Boys with Nonsense Mutation Muscular Dystrophy �Outcomes for quantifying muscle function � 6

Comparative studies for DMD �Two trials done by PTC �The primary endpoint is 6

Graphical display for patient level data Treatment No. 1 2 3 4 Placebo 5

How to analyze multiple outcome data? �For each column (specific outcome), obtaining the treatment

How unlikely to observe this pattern under null hypothesis? Study 007 Favors Placebo

Another way to combine �For each outcome, we rank the observations over patients in

Limitation of this combination approach �Different outcomes have different scales, so it may be

3. Identifying a high value subgroup of patients? �A negative trial does not mean

4. How to monitoring trials “quantitatively” via prediction? �The usual practice is to use

5. How to monitor safety? �What is the conventional way? �Component-wise tabulation or analysis?

6. Quantifying treatment contrast (difference)? �Should be model-free parameter �Using difference of means, median,

Issues for the hazard ratio estimate �Hazard ratio estimate is routinely used for designing,

Model Free Parameter for Treatment Difference * Considering a two-treatment comparison study in “survival

Eastern Cooperative Oncology Group �E 4 A 03 trial to compare low- and high-dose

�The proportional hazards assumption is not valid �The PH estimator is estimating a quantity

�Conventional analysis: �Log-rank test: p=0. 47 �Hazard Ratio: HR=0. 87 (0. 60, 1. 27)

What is the alternative way for survival analysis? �Using the area under the curve

The area under Kaplan-Meier as a summary of survival distribution Treated Area under the

Cancer Study Example Restricted Mean (up to 40 months): � 35. 4 months vs.

7. Post-marketing/safety studies ? �It is not appropriate to use an event driven procedure

CV safety study for anti-diabetes drugs �Event driven studies, that is, we need to

The EXAMINE trial (alogliptin) NEJM, October 3, 2013

RMST (24 months): Placebo 21. 9 (21. 7, 22. 2) Alogliptin 22. 0 (21.

What if a smaller study? 95% confidence intervals for various measures Hazard Ratio Difference

8. Evaluating new treatment for rare diseases �Utilizing the registry data or natural history

How to make treatments comparable across studies? �Which patient population are we referring to?

�Nissen and Wolski (2007) performed a meta analysis to examine whether Rosiglitazone (Avandia, GSK),

Example Effect of Rosiglitazone on MI or CVD Deaths �Avandia was introduced in 1999

Example Effect of Rosiglitazone on MI or CVD Deaths �However, the effect of any

Ø Event Rates from 0% to 2. 70% for MI Ø Event Rates from

MI ? ? ? Log Odds Ratio CVD Death ? ? ? Log Odds

Questions �Rare events? �How to utilize studies with 0/0 events? �Validity of asymptotic inference?

Asymptotic Inference MI Exact Inference 95% CI: (-0. 08, 0. 38)% P-value = 0.

Asymptotic Inference CVD Death Exact Inference 95% CI: (-0. 13, 0. 23)% P-value =

10. ANCOVA/ stratified analysis? �The conventional procedures may be biased (CMH for binary data,

Summary � Could we modify our statistical training beyond classroom teaching? � Try to

Slides: 58

Download presentation

Translational Data Science L. J. Wei, Harvard University

Many thanks to � Lu Tian, Stanford � Tianxi Cai, Harvard � Brian Claggett, Harvard � Hajime Uno, Harvard � Takahiro Hasegawa, Shionogi, Japan � Soctt Evans, Harvard � Lihui Zhao, Northwestern � Danyu Lin, UNC � Zhiliang Ying, Columbia � Zhezhen Jin, Columbia � Colleagues at pharmaceutical industry

What is the goal of a clinical study? �To obtain robust, clinically interpretable treatment effect estimate with respect to risk-benefit perspectives at the patient’s level via efficient and reliable quantitative procedures

What are the issues? �The conventional way to conduct trials gives us fragmentary information �Lack of clinically meaningful totality evidence �Difficult to use the trial results for future patient’s management

A Few Methodology Issues 1. Estimation vs. testing �P-value provides little clinical information about treatment effect/risk �The size of the effect matters �Goodness of fit test? Using the prediction to assess model fit

TREAT study for EPO CV safety �If we follow the patients up to 48 month, the control arm's average stroke-free time is 46. 9 months and the Darb arm's is 46 months. The difference is 0. 9 month with 0. 95 CI (0. 4, 1. 4)m and p<0. 001 (very significant). �The p-value can be exaggerated for treatment difference. A small increase of Z-value may drastically decrease p-value. The confidence interval estimate is much stable and interpretable.

What is a clinically meaningful treatment effect via estimation? �Reimbursement issue beyond getting the medical product approved by regulatory agencies. �What is the “estimand? ” �If the overall treatment effect is not “clinically impressive, ” we may identify a “high value” subgroup via a pre-specified procedure

2. How do we define a primary endpoint with multiple outcomes? �What is current practice? �Define primary endpoints and secondary endpoints �Efficacy and toxicity (how to connect them together? ) �Disease burden measure? �The conventional component-specific analysis – informative missing, censoring or competing risks

Example: A large cardiovascular study 9

What is the general clinical practice for treating a patient with cardiovascular diseases? �Following the patient over time �Having periodic clinical/lab exams/tests �Recording the time to multiple clinical/lab outcomes (heart attack, stroke, CV hosp, CV death…BP, Hb. A 1 C, toxicity. . ) �Assessing the disease burden/progression over time via totality of multiple outcomes �Making decision of treatment selections 10

A typical cardiovascular(CV) study �Comparing a new therapy with standard care �Question is whether new treatment would prevent from having bad CV outcomes/toxcity �Following each patient over time �Times to multiple clinical events are collected 11

Conventional approaches for clinical trials � Choosing a single outcome (e. g. , time to clinical event) as the primary endpoint � Applying univariate analysis for the treatment difference � Figuring out how to handle informative censoring (competing risks) � Considering other outcomes (risk, benefit) as secondary endpoints � Not sure how to treat future patients from study results via those separate summary measures for efficacy/safety

Example : Beta-Blocker Evaluation of Survival (BEST) Trial (NEJM, 2001) �Study �Bucindolol vs. placebo �patients with advanced chronic heart failure -- n = 2707 �Average follow-up: 2 years �Primary endpoint: overall survival �Hazard ratio for death = 0. 90 (p-value = 0. 1)

BEST Trial

Possible solutions? �Using the patient’s disease burden or progression information during the entire followup to define the “responder” �Creating more than one response categories: ordinal categorical response �Brian Claggett’s thesis paper (Published in Biostatistics)

BEST Example: 8 Categories � 1: No events � 2: Alive, non-HF hospitalization only � 3: Alive, 1 HF hosp. � 4: Alive, >1 HF hosp. � 5: Late non-CV death (>12 months) � 6: Late CV death (>12 months) � 7: Early non-CV death (<12 months) � 8: Early CV death (<12 months)

Example: Treatment for HIV infected children �Primary endpoint: viral load reduction �Major secondary endpoint: growth profile over 48 weeks

Example: DMD rare disease �Nonsense mutation Duchenne muscular dystrophy (nm. DMD) is a rare, X-linked, neuromuscular, childhood disorder. 18

Ambulatory Boys with Nonsense Mutation Muscular Dystrophy �Outcomes for quantifying muscle function � 6 MWD � 10 -meter walk/run � 4 -stair climb � 4 -stair descend 19

Comparative studies for DMD �Two trials done by PTC �The primary endpoint is 6 MWD �Various secondary endpoints �Each study was a 48 week, multicenter, randomized, double-blind, placebo controlled, compared the efficacy and safety of ataluren vs placebo in ambulatory boys with nm. DMD. 20

Graphical display for patient level data Treatment No. 1 2 3 4 Placebo 5 No. 1 2 3 4 5

Treatment 1 2 3 4 Placebo 5 1 2 3 4 5

How to analyze multiple outcome data? �For each column (specific outcome), obtaining the treatment difference D �Combining D’s linearly (weighted average) �Evaluating how “unlikely” to get the observed combined statistic �Wei-Lachin (JASA, 1984) and Wei-Johnson (Biometria, 1985) �Powerful if all the test statistics were on the “right direction” 23

How unlikely to observe this pattern under null hypothesis? Study 007 Favors Placebo Study 020 Favors Placebo Favors Ataluren ∆ 6 MWD Change at Week 48, LS Mean 95% CI (m) Endpoint -20 0 20 40 60 ∆ 6 MWD Change at Week 48, LS Mean 95% CI (m) Endpoint -20 6 MWD 10 -meter walk / run 4 -stair climb 4 -stair descend -2 0 2 Study 007 ITT Ataluren 10, 20 mg/kg (N=57) Placebo (N=57) 4 6 Favors Ataluren -2 0 20 0 6 40 60 2 Study 020 ITT Ataluren (N=114) Placebo (N=114) 4

Another way to combine �For each outcome, we rank the observations over patients in each treatment group �Add the ranks across each row (for each patient) so each patient has a rank score �Conducting a test using those scores �(O’Brien test) 25

Limitation of this combination approach �Different outcomes have different scales, so it may be only useful as a powerful test procedure �How to get an overall estimate for treatment effect? 26

3. Identifying a high value subgroup of patients? �A negative trial does not mean the treatment is no good for anyone �A positive trial does not mean it works for everyone �The usual subgroup analysis is not adequate to address this issue �Need a built-in pre-specified procedure for identifying patients who benefit from treatment �FDA’s guidance on predictive enrichment (2012)

4. How to monitoring trials “quantitatively” via prediction? �The usual practice is to use p-value (O-B stopping et al). �Use conditional power? �Use prediction confidence interval estimate (EAST new version)

5. How to monitor safety? �What is the conventional way? �Component-wise tabulation or analysis? �No information about multiple AE events at the patient level �Graphical method to show the temporal toxicity profile?

6. Quantifying treatment contrast (difference)? �Should be model-free parameter �Using difference of means, median, etc. �For censored data, using a constant hazard ratio (heavily model-based)? �Model-based measure is difficult to interpret or validate

Issues for the hazard ratio estimate �Hazard ratio estimate is routinely used for designing, monitoring and analyzing clinical studies in survival analysis

Model Free Parameter for Treatment Difference * Considering a two-treatment comparison study in “survival analysis” * How do we quantify the treatment difference? • Median failure time (may not be estimable); • t-year survival rate (not an overall measure)? • A constant hazard ratio over time with the logrank test

Eastern Cooperative Oncology Group �E 4 A 03 trial to compare low- and high-dose dexamethasone for naïve patients with multiple myeloma �The primary endpoint is the survival time �n=445 �The trial stopped early at the second interim analysis; the low dose was superior. �Patients on high-dose arm were then received low -dose and follow-up for overall survival were continued.

A Cancer Study Example Group 1 Group 2

�The proportional hazards assumption is not valid �The PH estimator is estimating a quantity which cannot be interpreted and, worse, depends on the study-specific censoring distributions �Any model-based treatment contrast has such issues (need a model-free parameter) �The logrank test is not powerful

�Conventional analysis: �Log-rank test: p=0. 47 �Hazard Ratio: HR=0. 87 (0. 60, 1. 27)

What is the alternative way for survival analysis? �Using the area under the curve of Kaplan-Meier estimate up to a fixed time point �Restricted mean survival time �Model-free and a global measure of efficacy �Can be estimated even under heavy censoring

The area under Kaplan-Meier as a summary of survival distribution Treated Area under the curve RMST: 33. 3 m Area under the curve RMST: 35. 4 m

Cancer Study Example Restricted Mean (up to 40 months): � 35. 4 months vs. 33. 3 months �Δ = 2. 1 (0. 1, 4. 2) months; p=0. 04 �Ratio of Survival time = 35. 4/33. 3 = 1. 06 (1. 00, 1. 13) �Ratio of time lost = 6. 7/4. 6 = 1. 46 (1. 02, 2. 13)

7. Post-marketing/safety studies ? �It is not appropriate to use an event driven procedure to conduct a safety study. �The event rate is low, the exposure time matters �Requires lot of resources (large or long-term study)

CV safety study for anti-diabetes drugs �Event driven studies, that is, we need to have a pre-specified # of events so the resulting confidence interval for the treatment difference is “narrow” �For example, the upper bound of 95% confidence interval is less than 1. 3

The EXAMINE trial (alogliptin) NEJM, October 3, 2013

RMST (24 months): Placebo 21. 9 (21. 7, 22. 2) Alogliptin 22. 0 (21. 8, 22. 3) Difference -0. 08 (-0. 39, 0. 24) Ratio 1. 00 (0. 98, 1. 01) RMST (30 months): Placebo 27. 1 (26. 7, 27. 4) Alogliptin 27. 2 (26. 9, 27. 5) Difference -0. 12 (-0. 56, 0. 33) Ratio 1. 00 (0. 98, 1. 01)

What if a smaller study? 95% confidence intervals for various measures Hazard Ratio Difference in event rate at Day 900 [%] Difference in RMST at Day 900 [days] 45 All data 25% 20% 15% N=16492 N=4123 N=3298 N=2427 (0. 89, 1. 12) (0. 80, 1. 26) (0. 78, 1. 28) (0. 76, 1. 36) (-1. 2, 0. 9) (-2. 3, 2. 0) (-2. 6, 2. 2) (-2. 9, 2. 6) (-5, 4) (-9, 9) (-11, 10) (-12, 12)

8. Evaluating new treatment for rare diseases �Utilizing the registry data or natural history data �Single arm trial? �Multiple outcomes? �It is not all clear how to quantify disease burden over time

How to make treatments comparable across studies? �Which patient population are we referring to? �It is not clear using the propensity score procedure. �Using a model relating outcome to covariates with registry data, then move the fitted model to the clinical trial population?

� 9. Meta analysis for safety issues

�Nissen and Wolski (2007) performed a meta analysis to examine whether Rosiglitazone (Avandia, GSK), a drug for treating type 2 diabetes mellitus, significantly increases the risk of MI or CVD related death.

Example Effect of Rosiglitazone on MI or CVD Deaths �Avandia was introduced in 1999 and is widely used as monotherapy or in fixed-dose combinations with either Avandamet or Avandaryl. �The original approval of Avandia was based on its ability in reducing blood glucose and glycated hemoglobin levels. �Initial studies were not adequately powered to determine the effects of this agent on micro- or macro - vascular complications of diabetes, including cardiovascular morbidity and mortality.

Example Effect of Rosiglitazone on MI or CVD Deaths �However, the effect of any anti-diabetic therapy on cardiovascular outcomes is particularly important because more than 65% of deaths in patients with diabetes are from cardiovascular causes. �Of 116 screened studies, 48 satisfied the inclusion criteria for the analysis proposed in Nissen and Wolski (2007). Ø 42 studies were reported in Nissen and Wolski (2007), the remaining 6 studies have zero MI or CVD death Ø 10 studies with zero MI events Ø 25 studies with zero CVD related deaths

Ø Event Rates from 0% to 2. 70% for MI Ø Event Rates from 0% to 1. 75% for CVD Death

MI ? ? ? Log Odds Ratio CVD Death ? ? ? Log Odds Ratio 95% CI: (1. 03, 1. 98); p-value = 0. 03 95% CI: (0. 98, 2. 74); p-value = 0. 06 (in favor of the control)

Questions �Rare events? �How to utilize studies with 0/0 events? �Validity of asymptotic inference? �Exact inference? �Choice of effect measure? �Between Study Heterogeneity? �Common treatment effect or study specific treatment effect? �The number of studies not large?

Asymptotic Inference MI Exact Inference 95% CI: (-0. 08, 0. 38)% P-value = 0. 27 95% CI: (0. 02, 0. 42)% P-value = 0. 03

Asymptotic Inference CVD Death Exact Inference 95% CI: (-0. 13, 0. 23)% P-value = 0. 83 95% CI: (0. 00, 0. 31)% P-value = 0. 05

10. ANCOVA/ stratified analysis? �The conventional procedures may be biased (CMH for binary data, Cox PH model for event time data) �Using augmentation method (model free) �The gain from ANCOVA over the two sample inference procedures is very small unless the covariates and response are highly correlated or there is non-trivial covariate imbalance

Summary � Could we modify our statistical training beyond classroom teaching? � Try to figure out “how, where and what to learn? ” � Learning from doing a project with mentoring? � Could we have a coherent approach from the beginning to the end for a research project? � Making statistics more translational clinically/biologically? � George Box: Instead of figuring out the optimal solution to a wrong problem, try to get A solution to the right problem. � Asking ourselves “What is the question(s)? ”