Advantages Disadvantages of Adaptive Sample Size Re Estimation
































- Slides: 32

Advantages & Disadvantages of Adaptive Sample Size Re. Estimation Demonstrated with

Webinar Host § Head of Statistics § n. Query Lead Researcher § FDA Guest Speaker § Guest Lecturer HOSTED BY: Ronan Fitzpatrick

Webinar Overview Adaptive Design Advantages & Disadvantages Unblinded SSR Evaluation & Demo Blinded SSR Evaluation & Demo Discussion and Conclusions

Worked Examples Overview üTwo Means Group Sequential Trial üUnblinded SSR Example üTwo Means Conditional Power üTwo Means Blinded SSR WORKED EXAMPLES üUnblinded Two Means SSR Example üUnblinded incl. group sequential design & conditional power üTwo Means Blinded SSR

PAR T 1 Adaptive Design Evaluation

Adaptive Trials Overview Adaptive Trials are any trial where a change or decision is made to a trial while still on-going Encompasses a wide variety of potential adaptions § e. g. Early stopping, SSR, enrichment, seamless, dosefinding Adaptive trials seek to give control to trialist to improve trial based on all available information Adaptive trials can decrease costs & better inferences § Increasing interest but desire for more regulatory certainty

FDA CBER/CDER Adaptive Guidance (2018) New draft guidance published in Oct 2018 (PDUFA VI requirement) Comments up to Nov. 30 th “Adaptive designs have the potential to improve. . . Far less categorical than 2010 draft study power and reduce Emphasizes early collaboration with FDA the sample size and total Focus on design issues and Type I error cost" for investigational drugs, including "targeted e. g. pre-specification, blinding, simulation medicines that are being put into development today” In-depth on certain adaptive designs SSR, enrichment, switching, multiple treats Scott Gottlieb (FDA Commissioner) Also views on Bayesian and Complex

Adaptive Trials Evaluation Opportunities Risks 1. Earlier Decisions 2. Reduced Potential Cost 3. Higher Potential Success 4. Greater Generalizability 5. Stakeholder Buy-in 1. Complex/Different Stats 2. Logistical Costs and Issues 3. Bias/Unblinding (IDMC) 4. Type I Error Inflation 5. Potential Lower Efficiency

Advantages of Adaptive Design 1. Earlier Decisions: Can stop trial early, drop treatments or doses early, test more treatments @ same time, ethical consideration 2. Reduced Potential Cost: Earlier decisions (e. g. futility), greater up-front flexibility, compare more treatments 3. Higher Potential Success: Better statistical efficiency, improve trial in response to “live” data 4. Greater Generalizability: Can answer more questions (e. g. enrich subpops), evaluate unviable treatments 5. Stakeholder Buy-in: Higher subject buy-in with higher prob. of getting best treatment, lower potential costs/high success rate

Disadvantages of Adaptive Design 1. Complex/Different Stats: Reduced direct comparability to previous trials, may require specialised software & expertise 2. Logistical Costs and Issues: Additional software/expertise, IDMC & blinding (see below), additional design & consultation 3. Bias/Unblinding: May additional costs, major risk to integrity of trial, greater reliance on IDMC 4. Type I Error Inflation: Risk must be accounted for via simulation (see resources), undermines results if design implemented poorly 5. Potential Lower Efficiency: Additional costs, outlier cases may need more resources, long lead-ins, other constraints

Sample Size Re-estimation (SSR) Will focus here on specific adaptive design of SSR Adaptive Trial focused on higher sample size if needed Obvious adaption target due to intrinsic SSD uncertainty Note that more suited to knowable/short follow-up Could also adaptively lower N but not encouraged Two Primary Types: 1) Unblinded SSR; 2) Blinded SSR

About n. Query In 2017, 90% of organizations with clinical trials approved by the FDA used n. Query for sample size and power calculation

PAR T 2 Unblinded SSR Pros & Cons

Conditional Power (CP) & Group Sequential Trials (GST) CP gives P(Rejecting null given interim ES) Depends on what “true” assumed ES is Often used as criteria for futility testing More flexible than β-spending GSD: Early stopping @ interim analyses Interim = analysis while trial on-going. Analyses @ pre-specified times 2 Criteria for early stopping 1. 2. Efficacy (α-spending) Futility (β-spending) Must account for effect of interim analyses CP used as measure of “promising” Lan & De. Mets “spending function” for errors Spends proportion of error at each look results Multiple Error Spending Functions “Promising” = less but close to target CP available

Unblinded Sample Size Reestimation SSR suggested when interim effect size is “promising” (Chen et al) “Promising” user-defined but based on unblinded effect size Extends GSD with 3 rd option: continue, stop early, increase N Power for optimistic effect but increase N for lower relevant effects Updated FDA Guidance: Design which “can provided efficiency” Common criteria proposed for unblinded SSR is conditional power (CP) Probability of significance given interim data (more detail on next slide) 2 methods here: Chen, De. Mets & Lan; Cui, Hung & Wang 1 st uses GSD statistics but only penultimate look & high CP nd

Advantages of Unblinded SSR 1. Earlier Decisions: Inherit GST chance to stop early for efficacy/futility, adds ability to quickly adjust trial to reality 2. Reduced Potential Cost: Can plan for lower effect size/sample size up-front, still have GST ability to stop trials early 3. Higher Potential Success: Higher chance of powering properly vs. fixed term and GST. 4. Greater Generalizability: Wider range of findable potential effects. Optionality gives ability to test additional treatments 5. Stakeholder Buy-in: Optionality can make up-front investment lower, better chance of success valuable to

Disadvantages of Unblinded SSR 1. Complex/Different Stats: CDL uses standard GST statistics but CHW uses weighted statistic. Underweighting of post-SSR cohort 2. Logistical Costs and Issues: May need new software/routines (esp for CHW), need flexible protocol/resources for max N 3. Bias/Unblinding: Reliant on IDMC for decisions (though also true for GST), must take steps to ensure no back-calculation of effect 4. Type I Error Inflation: Risk must be accounted for via simulation (esp. CDL), ensure correct boundaries and stats used. 5. Potential Lower Efficiency: Max N may be higher than equivalent fixed term, initial N higher with GST (esp. liberal SF)

1 Means Group Sequential p m a x Example E le “A sample size of 242 subjects (121 per treatment group) provides at least 80% power to detect a relative difference of 53% between botulinum toxin A and standardized anticholinergic therapy, assuming a treatment difference of -0. 80 and a common SD of 2. 1 (effect size = 0. 381), and a two-sided type I error rate of 5%. Sample size has been adjusted to allow for a 10% loss to follow-up over the 6 -months of treatment as well as one interim analysis to stop early for benefit. ” Parameter Significance Level (2 sided) Onabotulinumtoxin. A Mean Anticholinergic Mean Standard Deviation (Both) Power # Interim Analyses α Spending Function Expected Dropout Value 0. 05 -2. 3 -1. 5 2. 1 80% 1 O’Brien. Fleming 10% Source: NEJM (2012)

p le m a Ex 1 Unblinded Means SSR Example Assume same design as GSD Example (Example 3) with HSD (γ=1. 5) futility variant (n = 114) Assume interim difference = 0. 6 (from 0. 8), interim common SD = 2. 31 (from 2. 1) and interim n of 57 per group with nominal alpha of 0. 0245 for final look. What will required N be for SSR for Chen-Demets-Lan, Cui-Hung. Wang assuming multiplier = 2? Parameter Nominal Final Look Sig. Level Interim Difference Interim SD (Both) Initial N per Group Interim N per Group Maximum N per group Lower CP Bound Upper CP Bound Value 0. 0245 -0. 6 2. 31 114 57 228 Derived/40% 80%

PAR T 3 Blinded SSR Pros & Cons

Blinded Sample Size Reestimation BSSR uses interim blinded nuisance parameter estimate Use of blinded data reduces logistical/regulatory issues A “well understood” & “attractive choice” for adaptive design Multiple methods but focus on internal pilot approach Update N based on parameter estimate from internal pilot Use same methods as fixed term trial incl. pilot data Small error inflation but negligible for most cases Other methods control error but use p-value combination

Blinded SSR n. Query Summary (Winter 2018) Blinded SSR Means Blinded SSR Props SSR Criteria: Variance Three σ2 Estimate Methods SSR Criteria: Overall Success Rate 1. Two Sample Inequality 2. Two Sample NI 3. Two Sample Equiv Assumes effect size true 1. Two Sample Inequality 2. Two Sample NI

Advantages of Blinded SSR 1. Earlier Decisions: Able to adjust study to empirical estimate rather than pre-specified estimate for nuisance parametet 2. Reduced Potential Cost: Cheaper than external pilot by using data, potential to decrease sample size if desired 3. Higher Potential Success: Increases prob. sufficient power, more used N than external pilot, less reliance on conservative guesses 4. Greater Generalizability: Better powered estimates vs. fixed term but greater confidence may make approval easier 5. Stakeholder Buy-in: Advantages as above in terms of

Disadvantages of Blinded SSR 1. Complex/Different Stats: Minimal unless using p-value combination, can debate over best blinded parameter estimate 2. Logistical Costs and Issues: Minimal additional costs and logistical issues beyond retaining the blind 3. Bias/Unblinding: Ensuring blinding retained is vital to ensure approval. Challenging for some designs (e. g. open-label trials) 4. Type I Error Inflation: Risk should be test for via simulation Generally minimal for inequality, can be issue for NI/equiv 5. Potential Lower Efficiency: Internal pilot estimate may be

p le m a Ex 2 Two Sample Mean Blinded SSR Example “We estimated that we would need to enrol 160 patients, given an expected mean (±SD) annual decline in the FVC of 9± 16 percent of the predicted value and a dropout rate of 15 percent, to achieve a two-sided alpha level of 0. 05 and a statistical power of 90 percent. ” Parameter Significance Level (2 Sided) Value 0. 05 Mean Difference (%) -9 Standard Deviation (%) 16 Dropout Rate 15% Target Power 90% Nuisance Parameter? Source: NEJM (2006) Standard Deviation

PAR T 4 Sample Size Re-estimation

Discussion and Conclusions Adaptive Trials expected to become more common Regulatory & legislative environment increasingly positive SSR continues to be a common form of adaptive trial Blinded already widely accepted, unblinded growing Unblinded SSR can “provide efficiency” over GSD Plan up-front for expected effects but option to find lower ones But potential for logistical issues (blinding/IDMC) & efficiency Blinded SSR is “attractive choice” for trial design Deal with uncertainty in nuisance parameters with

n. Query Winter 2018 Update Winter 2018 release adds n. Query Adapt module, 32 new tables & undo/redo Proportions + Crossover 20 New Core Tables Assurance 12 n. Query Bayes Tables Conditional Power GST + SSR 15 n. Query Adapt Tables

Q&A Any Questions? For further details, contact at: info@statsols. com Thanks for listening!

Resources §Summary of what’s new in n. Query’s Adaptive module: https: //www. statsols. com/whats-new ____________________________ §FDA Draft Guidance: https: //www. fda. gov/downloads/drugs/guidances/ucm 201790. pdf §Statsols Blog on FDA Guidance: https: //blog. statsols. com/new-fda-guidance-on-adaptive-clinical-trialdesign §More detail: See references and n. Query 8. 3. 0. 0 Manual - Chapter 4

References Jennison, C. , & Turnbull, B. W. (1999). Group sequential methods with applications to clinical trials. CRC Press. Chen, Y. J. , De. Mets, D. L. , & Gordon Lan, K. K. (2004). Increasing the sample size when the unblinded interim result is promising. Statistics in medicine, 23(7), 1023 -1038. Cui, L. , Hung, H. J. , & Wang, S. J. (1999). Modification of sample size in group sequential clinical trials. Biometrics, 55(3), 853 -857. Mehta, C. R. and Pocock, S. J. , 2011. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Statistics in medicine, 30(28), pp. 3267 -3284. Visco, A. G. , et al (2012). Anticholinergic therapy vs. onabotulinumtoxina for urgency urinary incontinence. New England Journal of Medicine, 367(19), 1803 -1813.

References (cont. ) Friede, T. , & Kieser, M. (2006). Sample size recalculation in internal pilot study designs: a review. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 537 -555. Friede, T. , & Kieser, M. (2004). Sample size recalculation for binary data in internal pilot study designs. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry, 3(4), 269279. Wittes, J. , & Brittain, E. (1990). The role of internal pilot studies in increasing the efficiency of clinical trials. Statistics in Medicine, 9(1‐ 2), 65 -72. Tashkin, D. P. , Elashoff, R. , Clements, P. J. , Goldin, J. , Roth, M. D. , Furst, D. E. , . . . & Seibold, J. R. (2006). Cyclophosphamide versus placebo in scleroderma lung disease. New England Journal of Medicine, 354(25), 2655 -2666.