Selective cutoff reporting in studies of diagnostic test

  • Slides: 25
Download presentation
Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Comparing

Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Comparing traditional meta-analysis to individual patient data meta-analysis Brooke Levis, MSc, Ph. D Candidate Jewish General Hospital and Mc. Gill University Montreal, Quebec, Canada

Does Selective Reporting of Data-driven Cutoffs Exaggerate Accuracy? The Hockey Analogy 2

Does Selective Reporting of Data-driven Cutoffs Exaggerate Accuracy? The Hockey Analogy 2

What is Screening? n n n Purpose to identify otherwise unrecognisable disease By sorting

What is Screening? n n n Purpose to identify otherwise unrecognisable disease By sorting out apparently well persons who probably have a condition from those who probably do not Not diagnostic Positive tests require referral for diagnosis and, as appropriate, treatment A program – of which a test is one component Illustration: This information was originally developed by the UK National Screening Committee/NHS Screening Programmes (www. screening. nhs. uk) and is used under the Open Government Licence v 1. 0 3

The Patient Health Questionnaire (PHQ-9) depression screening tool n Patient Health Questionnaire (PHQ-9) n

The Patient Health Questionnaire (PHQ-9) depression screening tool n Patient Health Questionnaire (PHQ-9) n Depression screening tool n Scores range from 0 to 27 n Higher scores = more severe symptoms 4

Selective Reporting of Results Using Data-Driven Cutoffs n Extreme scenarios: n Cutoff of ≥

Selective Reporting of Results Using Data-Driven Cutoffs n Extreme scenarios: n Cutoff of ≥ 0 n All subjects above cutoff n sensitivity n = 100% Cutoff of ≥ 27 n All subjects below cutoff n specificity = 100% 5

Does Selecting Reporting of Data-driven Cutoffs Exaggerate Accuracy? Manea et al. , CMAJ, 2012

Does Selecting Reporting of Data-driven Cutoffs Exaggerate Accuracy? Manea et al. , CMAJ, 2012 n Sensitivity increases from cutoff of 8 to cutoff of 11 n For standard cutoff of 10, missing 897 cases (13%) n For cutoffs of 7 -9 and 11, missing 52 -58% of data 6

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can we identify predictable patterns of selective cutoff reporting? n Why does selective cutoff reporting appear to impact sensitivity, but not specificity? n Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 7

Methods n Data source: n n Inclusion criteria: n n n Studies included in

Methods n Data source: n n Inclusion criteria: n n n Studies included in published traditional meta-analysis on the diagnostic accuracy of the PHQ-9. (Manea et al, CMAJ 2012) Unique patient sample Published diagnostic accuracy for MDD for at least one PHQ-9 cutoff Data transfer: n n Invited authors of the eligible studies to contribute their original patient data (de-identified) Received data from 13 of 16 eligible datasets (80% of patients, 94% of MDD cases) 8

Methods n Data preparation n n For each dataset, extracted PHQ-9 scores and MDD

Methods n Data preparation n n For each dataset, extracted PHQ-9 scores and MDD diagnostic status for each patient, and information pertaining to weighting Statistical analyses (2 sets performed) n Traditional meta-analysis n n For each cutoff between 7 and 15, included data from the studies that reported accuracy results for the respective cutoff in the original publication IPD meta-analysis n For each cutoff between 7 and 15, included data from all studies 9

Comparison of data availability Published data (traditional MA) All data (IPD MA) Cutoff #

Comparison of data availability Published data (traditional MA) All data (IPD MA) Cutoff # of studies # of patients # mdd cases 7 4 2094 550 13 4589 1037 8 4 2094 550 13 4589 1037 9 4 1579 309 13 4589 1037 10 11 3794 723 13 4589 1037 11 5 1253 216 13 4589 1037 12 6 1388 261 13 4589 1037 13 4 1073 186 13 4589 1037 14 3 977 150 13 4589 1037 15 4 1075 193 13 4589 1037 10

Methods n Model: Bivariate random-effects* meta-analysis models n Models sensitivity and specificity at the

Methods n Model: Bivariate random-effects* meta-analysis models n Models sensitivity and specificity at the same time n Accounts for clustering by study n Provides an overall pooled sensitivity and specificity for each cutoff, for the 2 sets of analyses n n Within each set of analyses, each cutoff requires its own model Estimates between study heterogeneity Note: model accounts for correlation between sensitivity and specificity at each threshold, but not for correlation of parameters across thresholds *Random effects model: sensitivity & specificity assumed to vary across primary studies 11

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can we identify predictable patterns of selective cutoff reporting? n Why does selective cutoff reporting appear to impact sensitivity, but not specificity? n Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 12

Comparison of Diagnostic Accuracy Published data (traditional MA) Cutoff N studies All data (IPD

Comparison of Diagnostic Accuracy Published data (traditional MA) Cutoff N studies All data (IPD MA) Sens Spec Cutoff N studies Sens Spec 7 4 0. 85 0. 73 7 13 0. 97 0. 73 8 4 0. 79 0. 78 8 13 0. 93 0. 78 9 4 0. 78 0. 82 9 13 0. 89 0. 83 10 11 0. 85 0. 88 10 13 0. 87 0. 88 11 5 0. 92 0. 90 11 13 0. 83 0. 90 12 6 0. 82 0. 92 12 13 0. 77 0. 92 13 4 0. 82 0. 94 13 13 0. 67 0. 94 14 3 0. 71 0. 97 14 13 0. 59 0. 96 15 4 0. 61 0. 98 15 13 0. 52 0. 97 13

Comparison of ROC Curves 14

Comparison of ROC Curves 14

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can we identify predictable patterns of selective cutoff reporting? n Why does selective cutoff reporting appear to impact sensitivity, but not specificity? n Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 15

Publishing trends by study 16

Publishing trends by study 16

Comparison of Sensitivity by Cutoff 17

Comparison of Sensitivity by Cutoff 17

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can we identify predictable patterns of selective cutoff reporting? n Why does selective cutoff reporting appear to impact sensitivity, but not specificity? n Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 18

Comparison of Diagnostic Accuracy Published data (traditional MA) Cutoff N studies All data (IPD

Comparison of Diagnostic Accuracy Published data (traditional MA) Cutoff N studies All data (IPD MA) Sens Spec Cutoff N studies Sens Spec 7 4 0. 85 0. 73 7 13 0. 97 0. 73 8 4 0. 79 0. 78 8 13 0. 93 0. 78 9 4 0. 78 0. 82 9 13 0. 89 0. 83 10 11 0. 85 0. 88 10 13 0. 87 0. 88 11 5 0. 92 0. 90 11 13 0. 83 0. 90 12 6 0. 82 0. 92 12 13 0. 77 0. 92 13 4 0. 82 0. 94 13 13 0. 67 0. 94 14 3 0. 71 0. 97 14 13 0. 59 0. 96 15 4 0. 61 0. 98 15 13 0. 52 0. 97 19

Why Sensitivity Changes with Moving Cutoffs, but Not Specificity 20

Why Sensitivity Changes with Moving Cutoffs, but Not Specificity 20

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can

Questions n Does selective cutoff reporting lead to exaggerated estimates of accuracy? n Can we identify predictable patterns of selective cutoff reporting? n Why does selective cutoff reporting appear to impact sensitivity, but not specificity? n Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 21

Heterogeneity 22

Heterogeneity 22

Summary n Selective cutoff reporting in depression screening tool DTA studies may distort accuracy

Summary n Selective cutoff reporting in depression screening tool DTA studies may distort accuracy across cutoffs. n It will lead to exaggerated estimates of accuracy. n These distortions were relatively minor in the PHQ, but would likely be much larger for other measures where standard cutoffs are less consistently reported and more data-driven reporting seems to occur (e. g. , HADS). n IPD meta-analysis can address this and will allow subgroup-based accuracy evaluation. 23

Summary n STARD undergoing revision: n Needs to require precision-based sample size calculation to

Summary n STARD undergoing revision: n Needs to require precision-based sample size calculation to avoid very small samples – particularly number of cases – and unstable estimates n Needs to require reporting of spectrum of cutoffs, which is easily done with online appendices 24

Acknowledgements DEPRESSD Investigators n n n Brett Thombs Andrea Benedetti Roy Ziegelstein Pim Cuijpers

Acknowledgements DEPRESSD Investigators n n n Brett Thombs Andrea Benedetti Roy Ziegelstein Pim Cuijpers Simon Gilbody John Ioannidis n n n Scott Patten Dean Mc. Millan Ian Shrier Russell Steele Lorie Kloda Other Contributors n Alex Levis n Danielle Rice 25