Statistical Design Considerations for Noninferiority trials Andrew Nunn

Outline • • • What is a non-inferiority trial? How do they differ from

What is a non-inferiority trial? • • How is it different from an equivalence

A little bit of history - 35 yrs ago • Study R, the first

Study R – 30 month relapse free rates A possible δ 6 SHR Confidence

A point to remember • “It is never correct to claim that treatments have

Why non-inferiority for new TB drugs? • Under a wide variety of trial conditions

Our goal • Our goal is to reduce treatment duration to a maximum of

EMEA quote • “If no degree of possible inferiority of the test [new regimen]

Does non-significant = non-inferior? • No! definitely not. • Common sense will tell us

How do we do it? • We need a null hypothesis. • The situation

Equivalence & non-inferiority What’s the difference? TB Forum December 2005 12

Determining equivalence • First step in establishing equivalence - define ‘limits of equivalence’ (±

Non-inferiority • Equivalence requires that the difference control - new intervention is both >

Non-inferiority The 95% CI for the difference between the control and the intervention are

Non-inferiority The lower 95% CI for the difference between the control and the intervention

Non-inferiority and superiority The 95% CI for the difference between the control and the

Non-inferiority and inferiority The 95% CI for the difference between the control and the

Choosing δ • The value of δ must be chosen before the trial begins.

Example: 2 NN Study • van Leth, Phanuphak et al (Lancet 2004), a study

Example: 2 NN Study • Confidence intervals for failure rates (E-2 NN) – All

Example: 2 NN Study • BUT, the authors concluded: ‘Antiviral therapy with nevirapine or

Does it matter? • A non-inferiority trial can demonstrate significant benefit from the new

Adverse effects • Assessment of adverse effects is particularly important in equivalence trials. •

Choosing δ • On 27 th July 2005 the European Medicines Agency (EMEA) issued

Quote from EMEA document “The lower limit of the confidence interval [of the difference

Quote from EMEA document “Of course this is not an actual lower bound and

General EMEA recommendations • If possible three study arms should be included, test, reference

Design consideration • It is important to ensure that the design of equivalence trials,

Internal validity • In a superiority trial there is a strong incentive to ensure

But • If there already many treatments being used interchangeably for the disease under

Accepting a larger δ • “In the situation where the test product is anticipated

Is there a case for a larger δ if treatment can be shortened? •

How large a δ would you accept? • If treatment could be shortened from

FDA position • FDA (as described in FDA’s 1992 Points to Consider document) originally

Example : Pediatric Meningitis Trial Investigational Drug vs. Active Control Projected response rate Delta

What confidence level? • Traditionally we use 95% confidence in superiority trials (thanks to

Calculating power - an example • Given the expected range of, say 3 -6%

5% relapse, δ = 5%, 400 per arm 5% relapse, δ = 10%, 100

But. . . • These power calculations do not allow for additional numbers required

How should we analyse noninferiority trials? • Superiority trials are analysed by ITT because

Defining ITT and PP • Definitions vary. – For ITT some definitions exclude patients

CAVE! • Drop-outs from the two regimens need to be carefully evaluated. • Suppose

Interim analyses • Do we need them? • Probably not if it is to

Conclusions 1 • A major concern among regulators in many NI trials is that

Conclusions 2 • NI trials must be conducted with rigour • The value of

Regulatory Guidance • ICH E 9 ‘Note for Guidance on Statistical Principles for Clinical

Selected references • D’Agostino RB, Massaro JM et al: Non-inferiority trials: design concepts and

Slides: 53

Download presentation

Statistical & Design Considerations for Non-inferiority trials Andrew Nunn MRC Clinical Trials Unit London TB Forum December 2005

Outline • • • What is a non-inferiority trial? How do they differ from superiority trials? What do the regulators say? How large do the trials need to be? How should the trials be conducted analysed? TB Forum December 2005 2

What is a non-inferiority trial? • • How is it different from an equivalence trial? Does non-significant imply non-inferior? Does non-inferior imply non-significant? Are non-inferiority trials always larger than superiority trials? • Can a failed superiority trial be turned into a noninferiority trial? • Why do we need these trials in TB? TB Forum December 2005 3

A little bit of history - 35 yrs ago • Study R, the first East African/BMRC trial of short course chemotherapy could be regarded as a non-inferiority trial. • 2 STH/16 TH worked well under strict trial conditions. • The main objective was to see if a six month regimen was at least as good as the standard treatment, better - would be a bonus. S = streptomycin, T = thiacetazone, H = isoniazid TB Forum December 2005 4

Study R – 30 month relapse free rates A possible δ 6 SHR Confidence intervals for difference from control 2 STH/16 TH regimen 6 SHZ 6 SHT 6 SH -35% -5% 0 No difference 5%

A point to remember • “It is never correct to claim that treatments have no effect or that there is no difference in the effects of treatments. It is impossible to prove … that two treatments have the same effect. There will always be uncertainty surrounding estimates of treatment effects, and a small difference can never be excluded. ” Alderson P, Chalmers I. BMJ 2003: 326: 1691 -8. 6

Why non-inferiority for new TB drugs? • Under a wide variety of trial conditions the gold standard 2 EHRZ/4 HR regimen is at least 95% effective. – Nomads in the Algerian Sahara – Recently published IUATLD study in Africa and Asian centres. • We will be very unlikely to better it. – we would require a total of 2600 evaluable patients to demonstrate a reduction from 5% to 2. 5% relapses TB Forum December 2005 7

Our goal • Our goal is to reduce treatment duration to a maximum of 4 months and preferably less. • How much are we prepared to pay, if anything, for such a reduction? – must the new regimen be as good as the standard? – would we be satisfied with a regimen that was almost as good? – if so, how good is almost? TB Forum December 2005 8

EMEA quote • “If no degree of possible inferiority of the test [new regimen] to the reference [control] is acceptable, then the development of products with equal efficacy to a comparator by means of non-inferiority trials would become impossible. ” EMEA /CPMP /EWP /2158 /99 TB Forum December 2005 9

Does non-significant = non-inferior? • No! definitely not. • Common sense will tell us that a nonsignificant result from an under-powered study is, in the extreme case of little value. • BUT, non-inferior does not necessarily mean non-significant! TB Forum December 2005 10

How do we do it? • We need a null hypothesis. • The situation is the reverse of what is required in a superiority design. • For superiority – H 0 is there is no difference. • For non-inferiority – H 0 is there is a difference. • The alternative hypothesis is also reversed. TB Forum December 2005 11

Equivalence & non-inferiority What’s the difference? TB Forum December 2005 12

Determining equivalence • First step in establishing equivalence - define ‘limits of equivalence’ (± δ) • Having conducted the trial, calculate the 95% confidence intervals for the difference between the control and the new treatment • If the confidence interval is entirely within ± δ then equivalence is established TB Forum December 2005 13

Non-inferiority • Equivalence requires that the difference control - new intervention is both > -δ and < δ, the new treatment must be neither worse nor better than the control by a fixed amount. • In contrast to equivalence with non-inferiority we are only interested in determining whether new treatment is no worse by an amount δ. TB Forum December 2005 14

Non-inferiority The 95% CI for the difference between the control and the intervention are all > -δ, i. e. non-inferiority demonstrated. -δ 0 No difference TB Forum December 2005 15

Non-inferiority The lower 95% CI for the difference between the control and the intervention are all > -δ, i. e. non-inferiority demonstrated. The lower 95% CI is < -δ, non-inferiority has not been demonstrated. -δ 0 No difference TB Forum December 2005 16

Non-inferiority and superiority The 95% CI for the difference between the control and the intervention are all >-δ, i. e. noninferiority demonstrated. In this case both noninferiority and superiority have been demonstrated -δ 0 No difference TB Forum December 2005 17

Non-inferiority and inferiority The 95% CI for the difference between the control and the intervention are all >-δ, i. e. noninferiority demonstrated. In this case both noninferiority and superiority have been demonstrated In this case both noninferiority and inferiority have been demonstrated -δ 0 No difference TB Forum December 2005 18

Choosing δ • The value of δ must be chosen before the trial begins. • It’s value will depend on clinical, statistical and possibly regulatory considerations. TB Forum December 2005 19

Example: 2 NN Study • van Leth, Phanuphak et al (Lancet 2004), a study of first-line antiretroviral therapy in HIV • Main comparison between nevirapine twice daily and efavirenz (plus stavudine and lamivudine) in terms of ‘treatment failure’ (based on virology, disease progression, therapy change) • Primary objective was to establish the non-inferiority of nevirapine twice daily (δ =10%) TB Forum December 2005 20

Example: 2 NN Study • Confidence intervals for failure rates (E-2 NN) – All data – Only those starting med. – Concurrently randomised (-12. 8%, 0. 9%) (-14. 6%, -0. 8%) (-11. 9%, 3. 4%) • Non of these intervals are completely above δ value of -10%; one interval also excludes zero TB Forum December 2005 21

Example: 2 NN Study • BUT, the authors concluded: ‘Antiviral therapy with nevirapine or efavirenz showed similar efficacy, so triple-drug regimens with either … are valid for first-line treatment’ Lancet 2004, 363: 1253 -63 TB Forum December 2005 22

Does it matter? • A non-inferiority trial can demonstrate significant benefit from the new treatment (cf Study A). • But is it possible to have non-inferiority and a significantly worse outcome in the new treatment? • Yes! provided δ is acceptable to clinicians. – if N is large enough any difference can be shown to be significant! TB Forum December 2005 23

Adverse effects • Assessment of adverse effects is particularly important in equivalence trials. • It is not enough to prove non-inferiority in terms of efficacy. • A new treatment must be as safe, or safer, than the old one. TB Forum December 2005 24

Choosing δ • On 27 th July 2005 the European Medicines Agency (EMEA) issued a new European “Guideline on the choice of the non-inferiority margin” • This guideline comes into effect in January 2006. EMEA /CPMP /EWP /2158 /99 TB Forum December 2005 25

Quote from EMEA document “The lower limit of the confidence interval [of the difference between the new regimen and the control]. . . represents a lower bound and is usually interpreted as the degree of inferiority to the reference that can be excluded based on the data presented…. . EMEA /CPMP /EWP /2158 /99 TB Forum December 2005 26

Quote from EMEA document “Of course this is not an actual lower bound and the magnitude of inferiority could be greater. However it is generally considered that the chance of the true difference being worse than that suggested by this bound is acceptably small. ” EMEA /CPMP /EWP /2158 /99 TB Forum December 2005 27

General EMEA recommendations • If possible three study arms should be included, test, reference and placebo - allows validation of the noninferiority margin. • The margin should be such there is assurance that the test arm has a clinically relevant effect. • The primary focus is the relative effect of the test arm and the reference arm. • The choice of the margin should be justified in the protocol • The choice of the margin should be independent of power considerations. TB Forum December 2005 28

Design consideration • It is important to ensure that the design of equivalence trials, including definitions of a favourable response, should be as similar as possible to earlier trials assessing the control regimen. TB Forum December 2005 29

Internal validity • In a superiority trial there is a strong incentive to ensure high quality of conduct. • In contrast in an non-inferiority trial the conclusion of non-inferiority could be reached because of poor discriminatory power. • In a TB trial this could occur if follow-up rates were poor and/or there was failure in the lab to detect all relapses. TB Forum December 2005 30

But • If there already many treatments being used interchangeably for the disease under consideration a possible approach might be to consider the information available from all of them. From this a delta may be constructed which summarises the information known about the relative efficacy of these products, and the new trial can be designed to provide a similar level of knowledge of the relative efficacy of the new product. TB Forum December 2005 31

Accepting a larger δ • “In the situation where the test product is anticipated to have a safety advantage over the reference it is likely that a larger delta could be justified as some loss of efficacy might be accepted in exchange for the safety benefits” TB Forum December 2005 32

Is there a case for a larger δ if treatment can be shortened? • “It may be possible to justify a wider noninferiority margin for efficacy if the product has an advantage in some other aspect of its profile. This margin should not, however, be so wide that superiority to placebo is left in doubt” TB Forum December 2005 33

How large a δ would you accept? • If treatment could be shortened from 6 to 4 months would an increase in the failure/relapse rate from 5% to 10% be acceptable? TB Forum December 2005 34

How large a δ would you accept? • If treatment could be shortened from 6 to 4 months would an increase in the failure/relapse rate from 5% to 10% be acceptable? - provided that the failures and relapses could be satisfactorily retreated. TB Forum December 2005 35

FDA position • FDA (as described in FDA’s 1992 Points to Consider document) originally used a ‘step function’: Cure Rate 90% 80 - 89 % < 80 % δ 10% 15% 20% • A more flexible approach has since been adopted TB Forum December 2005 36

Example : Pediatric Meningitis Trial Investigational Drug vs. Active Control Projected response rate Delta Evaluable total sample size Projected % evaluable Total to be enrolled Projected enrollment time Sponsor’s Proposal FDA Proposal 80% 15% 224 70% 320 2 -4 years 80% 10% 504 70% 720 4 -6 years FDA proposed study considered not to be feasible Note: = 5%, power = 80% 37

What confidence level? • Traditionally we use 95% confidence in superiority trials (thanks to RA Fisher!) • Guidelines for pharmacokinetic equivalence have traditionally used 90% CI. • In regulatory situations the choice is based on level of risk regulators are prepared to accept. • Could be appropriate to use 90%, 95% or even 99%. • Need for flexibility. 38

Calculating power - an example • Given the expected range of, say 3 -6% relapse rates in the control, 2 EHRZ/4 HR regimen. • What study size would we require for a range of δ? TB Forum December 2005 39

5% relapse, δ = 10%, 100 per arm 41

5% relapse, δ = 5%, 400 per arm 5% relapse, δ = 10%, 100 per arm 42

But. . . • These power calculations do not allow for additional numbers required for a Per Protocol analysis, or patients excluded because they do not have TB, or because they have MDR disease. • Neither do they allow for losses to follow-up. TB Forum December 2005 43

How should we analyse noninferiority trials? • Superiority trials are analysed by ITT because it is the most conservative and least likely to be biased. • ITT analysis of non-inferiority trials is not conservative - there is a bias towards no difference. • PP biased since not all randomised patients included. • It is recommended that non-inferiority trials should be analysed by both ITT and per protocol (PP). TB Forum December 2005 44

Defining ITT and PP • Definitions vary. – For ITT some definitions exclude patients who either do not have confirmed diagnosis, or who never received treatment. – PP includes all receiving full course of treatment with no major protocol violations. • What definitions are appropriate for TB trials? • CPMP: ‘similar conclusions from both the ITT and PP are required in a non-inferiority trial. ’ • ‘Sample size computations should ensure sufficient numbers in the PP population’. CPMP: Committee on Proprietary Medical Products (2000) 45

CAVE! • Drop-outs from the two regimens need to be carefully evaluated. • Suppose patients not responding dropped out early from one treatment arm, or • Possibly because of differential withdrawal rate for adverse events • This would suggest there may be important differences between the treatments. TB Forum December 2005 46

Interim analyses • Do we need them? • Probably not if it is to consider stopping early for strong evidence of non-inferiority. • Such evidence would support a case for the possible superiority of the new treatment to the control - a strong incentive to keep on. TB Forum December 2005 47

Conclusions 1 • A major concern among regulators in many NI trials is that the efficacy of the control is not well established. • This is NOT the case with the control regimen 2 EHRZ/4 HR. One advantage of no new drugs for 40 years!! • In the event of establishing a 4 month regimen to be non-inferior it would be unwise to use that regimen as the control in the next NI trial - biocreep. Biocreep - slightly inferior treatment becomes the control for next generation of NI trials TB Forum December 2005 48

Conclusions 2 • NI trials must be conducted with rigour • The value of δ needs to be determined before the start of the trial and should take into account both clinical and statistical considerations. • Both the value of δ and other aspects of design need to be discussed with regulators • Non-inferiority needs to be demonstrated not only for efficacy but also for safety. TB Forum December 2005 49

Regulatory Guidance • ICH E 9 ‘Note for Guidance on Statistical Principles for Clinical Trials’, September 1998 • ICH E 10 ‘Note for Guidance on Choice of Control Group’, July 2000 • CPMP ‘Note for Guidance on the Investigation of Bioavailability and Bioequivalence’, July 2001 • CPMP ‘Points to Consider on Switching between Superiority and Non-Inferiority’, July 2000 • CHMP ‘Guideline on the Choice of the Non-Inferiority Margin’, July 2005 TB Forum December 2005 50

Selected references • D’Agostino RB, Massaro JM et al: Non-inferiority trials: design concepts and issues - the encounters of academic consultants in statistics. Statist Med 2003; 22: 169 -186. • Altman DG, Bland JM: Absence of evidence is not evidence of absence. BMJ 1995; 311: 485. • Blackwelder WC: Current issues in equivalence trials. J Dent Res 2004; 83: C 113 -115. • Jones B, Jarvis P et al: Trials to assess equivalence: the importance of rigorous methods. BMJ 1996; 313: 36 -9. TB Forum December 2005 51

Study R – 30 month relapse free rates A possible δ 6 SHR Confidence intervals for difference from control 2 STH/16 TH regimen 6 SHZ 6 SHT 6 SH -35% -5% 0 No difference 5%

TB Forum December 2005 53