CPMPEWP177699 Pt C on Missing Data Feb2007 Ferran

CPMP/EWP/1776/99: Pt. C on Missing Data Feb-2007 Ferran. Torres@uab. es 1

Evolución de los sujetos Feb-2007 *Ferran. Torres@uab. es 2

Datos faltantes (missing data)(1) u u Feb-2007 ¿Qué son los datos faltantes? ¡¡¡¡¡ Casillas vacías en los CRDs!!! Viola el principio de la estricto principio de la ITT La posibles causas son, por ejemplo : – Pérdida de seguimiento – Fracaso o éxito terapéutico – Acontecimiento adverso – Traslado del sujeto No todas las razones de abandono están relacionadas con el tratamiento *Ferran. Torres@uab. es 3

Datos faltantes (missing data) (2) u Afectando a : – – – Feb-2007 Solo un dato Varios datos en una visita Toda una visita Varias visitas Toda una variable Todas las visitas tras la inclusión *Ferran. Torres@uab. es 4

Datos faltantes (missing data) (3) u Por qué son un problema? Potencial fuente de sesgos en el análisis – – Feb-2007 Tanto mayor cuanto mayor la proporción de datos afectados Tanto más sesgo cuanto menos aleatorios Tanta más interferencia cuanto más relacionados con el tratamiento Impide la ITT *Ferran. Torres@uab. es 5

EJEMPLOS Feb-2007 Ferran. Torres@uab. es 6

Ejemplo: Descripción de poblaciones (1) Distribución de pacientes : All-randomized Patients with a randomization code 1208 (100%) Safety Receiving Any Study Medication 1190 (99%) Intent to treat Receiving Study medication and a Baseline VA 1186 (98%) Per-protocol …and without a Major Protocol Violation 1144 (95%) Per Protocol Week 54 observed …and with a Week 54 VA 1055 (87%) Feb-2007 Patients withdrawing before treatment Patients without Baseline VA *Ferran. Torres@uab. es No Major Protocol Violation E. g. , Cataract E. g. , Only a Baseline VA 7

Ejemplo 2: Incorrecto uso de poblaciones (1) Diseño u Cirugía vs Tratamiento Médico en estenosis carotidea bilateral (Sackket et al. , 1985) u Variable principal: Número de pacientes que presenten TIA, ACV o muerte u Distribución de los pacientes: u Pacientes randomizados: u Tratamiento quirúrgico: u Tratamiento médico: 167 94 73 – Pacientes que no completaron el estudio debido a ACV en las fases iniciales de hospitalización: u Tratamiento quirúrgico: 15 pacientes u Tratamiento médico: 01 pacientes Feb-2007 *Ferran. Torres@uab. es 8

Ejemplo 2: Incorrecto uso de poblaciones (2) Primer análisis que se realiza : u u Población Por Protocolo (PP): Pacientes que hayan completado el estudio Análisis – Tratamiento quirúrgico: 43 / (94 - 15) = 43 / 79 = 54% – Tratamiento médico: 53 / (73 - 1) = 53 / 72 = 74% – Reducción del riesgo: 27%, p = 0. 02 Feb-2007 *Ferran. Torres@uab. es 9

Ejemplo 2: Incorrecto uso de poblaciones (3) El análisis definitivo queda de la siguiente forma : u u Población Intención de Tratar (ITT): Todos los pacientes randomizados Análisis – Tratamiento quirúrgico: 58 / 94 = 62% – Tratamiento médico: 54 / 73 = 74% – Reducción del riesgo: 18%, p = 0. 09 (PP: 27%, p = 0. 02) Conclusiones: La población correcta de análisis es la ITT El tratamiento quirúrgico no ha demostrado ser significativamente superior al tratamiento médico Feb-2007 *Ferran. Torres@uab. es 10

Relación de los valores faltantes con 1) Tratamiento 2) Resultado Feb-2007 Ferran. Torres@uab. es 11

Feb-2007 *Ferran. Torres@uab. es 12

Feb-2007 *Ferran. Torres@uab. es 13

Feb-2007 *Ferran. Torres@uab. es 14

Feb-2007 *Ferran. Torres@uab. es 15

Feb-2007 *Ferran. Torres@uab. es 16

Feb-2007 *Ferran. Torres@uab. es 17

Feb-2007 *Ferran. Torres@uab. es 18

Tipos de Missing Feb-2007 Ferran. Torres@uab. es 19

MCAR – Missing completely at random u La probabilidad de obtener un missing es completamente independiente de: – Valores observados: u Variables basales, otras mediciones de la misma variable. . . – Valores no observados o missing u Ejemplo: Feb-2007 Cambio de ubicación geográfica *Ferran. Torres@uab. es 20

MAR – Missing at random u La probabilidad de obtener un missing depende: – Sí: Valores observados: – No: Valores no observados o missing u Ejemplo: Sujetos con peor puntuación basal abandonan el estudio independientemente del resultado Feb-2007 *Ferran. Torres@uab. es 21

Non-Ignorable u La probabilidad de obtener un missing depende: – Valores no observados o missing – Ejemplo: malas o excelentes respuestas cursan con una mayor tasa de abandonos Feb-2007 *Ferran. Torres@uab. es 22

Manejo de los valores faltantes Feb-2007 Ferran. Torres@uab. es 23

General Strategies u Complete-case analysis u “Weigthing methods” u Imputation methods u Analysing data as incomplete u Other methods Feb-2007 *Ferran. Torres@uab. es 24

Complete-case analysis u Analyse data only subjects with complete u Problems: – Loss of power – Bias u Only if MCAR may be assumed u Against the ITT principle Feb-2007 *Ferran. Torres@uab. es 25

“Weigthing methods” u To (Sometimes considered as a form of imputation) constuct weigths for incomplete cases: – Each patient belongs to a subgroup in which all subjects have the same characteristics – A proportion within each subgroup are destined to complete the study u Heyting el al. u Robins et al. Feb-2007 *Ferran. Torres@uab. es 26

Datos faltantes : métodos de tratamiento (2) Sujetos con valores missing en la variable de eficacia Randomización Inicio del tratamiento Feb-2007 *Ferran. Torres@uab. es 27

Datos faltantes : métodos de tratamiento (3) Se aplica el método LOCF (Last Observation Carried Forward) Randomización Inicio del tratamiento Feb-2007 *Ferran. Torres@uab. es 28

Datos faltantes : métodos de tratamiento (4) Se aplica el método BOCF (Basal Observation Carried Forward) Randomización Inicio del tratamiento Feb-2007 *Ferran. Torres@uab. es 29

Ejemplo: LOCF & Extrapolación lineal Adas-Cog 36 32 REGRESIÓN LINEAL 28 2420 16 LOCF = Sesgo de la información 12 8 4 0 Feb-2007 2 4 6 8 10 12 Time month *Ferran. Torres@uab. es 14 16 18 30

Feb-2007 *Ferran. Torres@uab. es 31

Imputation methods u LOCF and variants – Bias: u depending on the amount and timing of drop-outs: u Ej: The conditions under study has a worsening course – Conservative: u Drop-outs beacuse of lack of efficacy in the control group – Anticonservative: u Drop-outs beacuse of intolerance in the test group – Otros: interpolación, extrapolación Feb-2007 *Ferran. Torres@uab. es 32

Ejemplo: falta el resultado de Adas-cog en alguno de los tiempos Adas-Cog 36 32 Imputación por regresión 28 2420 16 12 8 4 0 Feb-2007 2 4 6 8 10 12 Time month *Ferran. Torres@uab. es 14 16 18 33

Imputation methods u Worst case analysis: – Impute: u The worst response to the test u The best response to the control – Ultraconservative. Increases the variability. – Robustness of results: u Second approach: “Sensitivity analysis” u Lower bound of efficacy Feb-2007 *Ferran. Torres@uab. es 34

Group Means u Continuous variable: – group mean derived from a grouping variable u Categorical – ordinal variable: – Mode – If no unique mode: – Nominal: a value will be randomly selected – Ordinal: the ‘middle’ category or a value is randomly chosen from the middle two (even case) Feb-2007 *Ferran. Torres@uab. es 35

Predicted Mean u Continuous or ordinal variables: u Least-squares multiple regression algorithm to impute the most likely value u Binary ua or categorical variable: discriminant method is applied to impute the most likely value. Feb-2007 *Ferran. Torres@uab. es 36

Imputation Class methods u Imputed values from responders that are similar with respect to a set of auxiliary variables. – Clinical experience – Statistical methods: Hot-Decking u Respondents and non-respondents are sorted into a number of imputation subsets according to a user-specified set of covariates. u An imputation sub-set comprises cases with the same values as those of the user-specified covariates. u Missing values are then replaced with values taken from matching respondents. – Options: u The first respondent’s value (similar in time) u A respondent’s randomly selected value Feb-2007 *Ferran. Torres@uab. es 37

Multiple Imputation u Replaces each missing value in the dataset with several imputed values instead of just one. Rubin 1970's u Steps: u Use complete data to estimate u Combine the estimators (i. e. Regresion coefficients) to compute predicted values u Randomly simulate a set of residuals to be added to the regression to impute m values Feb-2007 *Ferran. Torres@uab. es 38

MI: Assumptions (2) u The data model: – Probability model on observed data – Multivariate normal, loglinear. . . u Prediction u The of the missing data distribution u Specification u The Feb-2007 of the distribution for the parameters of the imputation models – Use likelihood / bayesian techniques for analysis u Noninformative prior distribution mechanism of nonresponse *Ferran. Torres@uab. es 39

Multiple Imputation S-PLUS u SOLAS u Gary King: u u Amelia u Joe Schafer: u web u Soft u The Feb-2007 multiple imputation FAQ page *Ferran. Torres@uab. es 40

Analysing data as incomplete u Time to event variables u Mixed models (random-fixed) Feb-2007 *Ferran. Torres@uab. es 41

Other u Gould 1980 – Converts the variable into an ordinal score. – Impute according a pre-defined value (ej. percentile) and the time and cause of drop-out (lack of efficacy, cure, adverse effects. . . ) u Miscelanea: u Missing Feb-2007 data indicators, pairwise deletion. . . *Ferran. Torres@uab. es 42

Missing Data in Trials – A Regulatory Feb-2007 Ferran. Torres@uab. es Clinical View 43

ICH-E 3, 6, 9 u Key points: – – – – Potential source of bias Common in Clinical Trials Avoiding MD Importance of the methods of dealing Pre-specification, re-definition Lack of universally accepted method for handling Sensitivity analysis Identification and description of missingness Feb-2007 *Ferran. Torres@uab. es 44

Points to Consider on Biostatistical / Methodological issues arising from recent CPMP discussion on licensing applications Pt. C on Missing Data Feb-2007 Ferran. Torres@uab. es 45

Feb-2007 *Ferran. Torres@uab. es 46

Structure 1. Introduction 2. The effect of MD on data analysis 3. Handling of MD 4. General recommendations Feb-2007 *Ferran. Torres@uab. es 47

INTRODUCTION Feb-2007 *Ferran. Torres@uab. es 48

Introduction u Potential source of bias u Many possible sources and different degrees of incompleteness u MD violates the ITT principle: – Full set analysis requires imputation u The strategy employed might in itself provide a source of bias Feb-2007 *Ferran. Torres@uab. es 49

The effect of missing values on data analysis and interpretation Feb-2007 *Ferran. Torres@uab. es 50

Effect on data analysis (1) u Power: – Reduction of cases for analysis: u reduction of power u Variability: – Non-completers (greater likelihood of extreme values): u Their Feb-2007 loss => underestimate of variability *Ferran. Torres@uab. es 51

Effect on data analysis u (2) Bias: u Estimation of treatment effect u Comparability of treatment groups u Representativeness of the sample – The reduction of the statistical power is mainly related to the number of missing values – The risk of bias depends upon the relationship between : Missingness u Treatment u Outcome u Feb-2007 *Ferran. Torres@uab. es 52

Effect on data analysis u Not (3 ) expected to lead to bias: – if MD are only related to the treatment – (an observation is more likely to be missing on one treatment arm than another) – but not to the outcome – real value of the unobserved measurement (poor outcomes are no more likely to be missing than good outcomes). Feb-2007 *Ferran. Torres@uab. es 53

Effect on data analysis (4 ) u Bias: – if MD (unmeasured observations) are related to the real value of the outcome u (e. g. the unobserved measurements have an higher proportion of poor outcomes) – this will lead to bias even if the missing values are not related to treatment (i. e. missing values are equally likely in all treatment arms). Feb-2007 *Ferran. Torres@uab. es 54

Effect on data analysis (5 ) u Bias: – If MD if they are related to both the treatment and the unobserved outcome variable u (e. g. missing values are more likely in one treatment arm because it is not as effective). Feb-2007 *Ferran. Torres@uab. es 55

Effect on data analysis u Pragmatic (6 ) approach: – In most cases it is difficult or impossible to elucidate whether the relationship between missing values and the unobserved outcome variable is completely absent. – Thus it is sensible to adopt a conservative approach, considering missing values as a potential source of bias. Feb-2007 *Ferran. Torres@uab. es 56

Handling of MD Feb-2007 *Ferran. Torres@uab. es 57

Handling of MD u (1) Avoidance of missingness: – In the design and conduct of a clinical trial all efforts should be directed towards minimising the amount of missing data likely to occur. – Despite these efforts some missing values will generally be expected. u The way these missing observations are handled may substantially affect the conclusions of the study. Feb-2007 *Ferran. Torres@uab. es 58

Handling of MD u Complete (2 ) case analysis: – Bias, power and variability – Not generally appropriate. Exceptions: – Exploratory studies, especially in the initial phases of drug development. – Secondary supportive analysis in confirmatory trials (robustness) u Violates the ITT principle. u It cannot be recommended as the primary analysis in a confirmatory trial Feb-2007 *Ferran. Torres@uab. es 59

Handling of MD u Imputation (3 ) of Missing Data: – Scope of imputation: u Not restricted to main outcomes: – (secondary efficacy, safety, baseline covariates. . . ) – Methods for imputation: u Many techniques u No gold standard for every situation Feb-2007 *Ferran. Torres@uab. es 60

Handling of MD u Methods (4 ) for imputation (cont): – Not a description of the different methods – All methods may be valid: u Simple methods to more complex: – From LOCF to multiple imputation methods u But their appropriateness has to be justified – e. g. : LOCF: acceptable if measurements are expected to be relatively constant over time. u In Alzheimer’s disease where the patient’s condition is expected to deteriorate over time, the LOCF method is less acceptable Feb-2007 *Ferran. Torres@uab. es 61

Handling of MD u (5 ) Statistical approaches less sensitive to MD: – Mixed models – Survival models u They assume no relationship between treatment and the missing outcome, and generally this cannot be assumed. Feb-2007 *Ferran. Torres@uab. es 62

General recommendations Feb-2007 *Ferran. Torres@uab. es 63

General recommendations u Avoidance (1) of missing data – Try to reduce the number of MD u Anticipate sources and try to avoid them in the design u Strategies to obtain measurements u If large amount of MD is expected: – Relevance of blinding (assignment and evaluation) u Anticipation of the “acceptable amount of MD” – Sample size Feb-2007 *Ferran. Torres@uab. es 64

General recommendations u Avoidance (2) of missing data (cont) – “Acceptable amount” of MD: u Not general rule, depends on – Nature of variable u Mortality vs sophisticated methods of diagnosis – Length of the clinical trial – Condition under study u Psychiatric disorders: low adherence of patients to study protocol Feb-2007 *Ferran. Torres@uab. es 65

General recommendations u Avoidance (3) of missing data (cont) – Continue data collection after patient withdrawal u ITT based on real data – Alternatives u Analysis on incomplete data or u Analysis on imputed data Feb-2007 *Ferran. Torres@uab. es 66

General recommendations u (4) Design of the study. Relevance of predefinition – Pre-specify in the protocol: u Description and justification of the method u Anticipation of the expected amount of MD – Deviations documented and justified u Conservative: – To avoid: u minimisation of differences in non-inferiority trials, overestimation in superiority trials Feb-2007 *Ferran. Torres@uab. es 67

General recommendations u (5) Design of the study. Relevance of predefinition (cont) – Update: – Unpredictability of some problems u Statistical Analysis Plan u During the Blind Review – Deviation and amendments documented (traceability) – Identification of the blinding Feb-2007 *Ferran. Torres@uab. es 68

General recommendations u (6 ) Analysis of missing data – Pattern of MD: time and proportion u Investigate whethere is any indication of differences between the treatment groups. – Elucidate if patients with and without missing values have different characteristics at baseline. u This might help to establish: – whether the missing values have lead to baseline imbalance, and – whether the process generating missing values has differentially influenced the treatment groups. Feb-2007 *Ferran. Torres@uab. es 69

General recommendations u (7 ) Sensitivity analysis u a set of analyses showing the influence of different methods of handling missing data on the study results – Some examples: u Imputation of Best plausible vs Worst plausible u Best possible in control and Worst possible in experimental and inversely u Full set analysis vs complete case analysis – Pre-defined and designed to assess the repercussion on the results of the particular assumptions made in imputation Feb-2007 *Ferran. Torres@uab. es 70

General recommendations u (8 ) Final Report – Detailed description of the planned and amendments of the predefined methods – Discussion of the MD: u Number, Time & Pattern u Possible implications in efficacy and safety – Imputed values must be listed and identified – A sensitivity analysis may give robustness to the conclusions Feb-2007 *Ferran. Torres@uab. es 71