Lecture 6 Diagnostic Tests and Screening Kevin Schwartzman

Objectives Students will be able to: 1. Define and calculate the following: Sensitivity, specificity,

Diagnostic Tests and Screening Readings: • Fletcher, chapters 3 (Diagnosis), 9 (Prevention) • Barry

Diagnostic Tests as diagnostic aids and screening tools - key element of clinical medicine

Diagnostic Tests and Screening—Slide 2 Definitive diagnosis/classification may be difficult or impossible to obtain.

Diagnostic Tests and Screening--Slide 3 We will focus largely on the situation where the

Diagnostic Tests and Screening-- Slide 4 We can use a 2 x 2 table

Diagnostic Tests and Screening-- Slide 5 Test + Test - Disease + True +

Diagnostic Tests and Screening-- Slide 6 Complementary probabilities: False negative rate = FN/(TP+FN) =

Diagnostic Tests and Screening-- Slide 7 Example: A researcher develops a new saliva pregnancy

Diagnostic Tests and Screening-- Slide 8 Pregnant Saliva + Saliva Totals 95 5 100

Diagnostic Tests and Screening-- Slide 9 Is it more important that a test be

Diagnostic Tests and Screening-- Slide 10 Example: The saliva pregnancy test detects progesterone. A

Diagnostic Tests and Screening-- Slide 11 The researcher conducts a validation study and finds

Diagnostic Tests and Screening-- Slide 12 The sensitivity and specificity of the saliva test

Diagnostic Tests and Screening-- Slide 13 The choice of cutpoint depends on the relative

Diagnostic Tests and Screening-- Slide 14 In practice, the clinician or researcher needs to

Diagnostic Tests and Screening – Slide 15 Hence we need to know: 1. How

Diagnostic Tests and Screening-- Slide 16 Key point: The positive and negative predictive values

Diagnostic Tests and Screening-- Slide 17 Example: The saliva pregnancy test is administered 30

Diagnostic Tests and Screening-- Slide 18 Based on sensitivity of 95%, expected TP =

Diagnostic Tests and Screening-- Slide 19 Positive predictive value = TP = 380/470 =

Diagnostic Tests and Screening-- Slide 20 Group 2: 1000 oral contraceptive users - pretest

Diagnostic Tests and Screening-- Slide 21 In this context, positive predictive value is only

Diagnostic Tests and Screening-- Slide 22 In which situation is the saliva test more

Diagnostic Tests and Screening-- Slide 23 • Note that the same test would likely

Diagnostic Tests and Screening-- Slide 24 Deriving predictive values (post-test probabilities) using a 2

Diagnostic Tests and Screening-- Slide 25 Bayes’ theorem: Allows us to calculate revised (“posterior”

Diagnostic Tests and Screening-- Slide 26 For positive predictive value, P (D+ T+) =

Diagnostic Tests and Screening - Slide 27 For negative predictive value, P(D- T-) =

Diagnostic Tests and Screening-- Slide 28 Example: What would be the positive and negative

Diagnostic Tests and Screening - Slide 29 P(not pregnant T-) = P(T- not pregnant)x.

Diagnostic Tests and Screening - Slide 30 Likelihood Ratios • An alternative way of

Diagnostic Tests and Screening - Slide 31 Likelihood Ratios • Post-test odds = pre-test

Diagnostic Tests and Screening - Slide 32 Likelihood Ratios • Pregnancy example, saliva test

Diagnostic Tests and Screening-- Slide 33 Pitfalls in assessments of diagnostic test performance •

Diagnostic Tests and Screening - Slide 34 Was the test applied in a consistent

Diagnostic Tests and Screening-- Slide 35 Example: New diagnostic tests for pulmonary embolism “Positive”

Diagnostic Tests and Screening-- Slide 36 Result: Good documentation of true and false positives

Diagnostic Tests and Screening-- Slide 37 Importance of the sample used for test validation:

Diagnostic Tests and Screening-- Slide 38 Example: saliva pregnancy test Imagine that test hinges

Diagnostic Tests and Screening - Slide 39 • On the other hand, the sensitivity

Diagnostic Tests and Screening-- Slide 40 You would reject results of a validation study

Diagnostic Tests and Screening-- Slide 41 So: Sensitivity and specificity estimates do not depend

Diagnostic Tests and Screening - Slide 42 Misclassification The use of an imperfect diagnostic

Diagnostic Tests and Screening - Slide 43 • For example, the use of an

Diagnostic Tests and Screening-- Slide 44 • The effect of nondifferential misclassification is to

Diagnostic Tests and Screening-- Slide 45 Differential misclassification implies that measurement error is associated

Diagnostic Tests and Screening – Slide 46 Screening • “The identification of an unrecognized

Diagnostic Tests and Screening – Slide 47 Sensitivity may be calculated by • Detection

Diagnostic Tests and Screening-- Slide 48 Biases in performance of screening tests (Does screening

Diagnostic Tests and Screening-- Slide 49 2. Length bias The probability of detecting a

Diagnostic Tests and Screening-- Slide 50 3. Overdiagnosis bias (a variant of length bias)

Diagnostic Tests and Screening-- Slide 51 4. Compliance bias • Persons who comply with

Slides: 54

Download presentation

Lecture 6 Diagnostic Tests and Screening Kevin Schwartzman MD June 16, 2006

Objectives Students will be able to: 1. Define and calculate the following: Sensitivity, specificity, positive and negative predictive values of diagnostic tests 2. Illustrate the influence of prevalence and/or pre-test probability on predictive values 3. Define pre- and post-test probabilities in terms of Bayes’ theorem and likelihood ratios 4. Identify key elements of screening programs and evaluations of their impact 5. Describe the impact of misclassification on results of clinical research studies

Diagnostic Tests and Screening Readings: • Fletcher, chapters 3 (Diagnosis), 9 (Prevention) • Barry MJ, Prostate-specific antigen testing for early diagnosis of prostate cancer, N Engl J Med 2001; 344: 1373 -1377 [Clinical Practice]

Diagnostic Tests as diagnostic aids and screening tools - key element of clinical medicine and public health. • Electrocardiogram, cardiac enzymes for diagnosis of myocardial infarction • Murphy’s sign (right upper abdominal tenderness on inspiration) in diagnosis of acute cholecystitis • Pap smear for detection of cervical cancer Also essential in many epidemiologic studies where diagnostic criteria and/or tests are used to establish exposure, outcome status. Goal is to minimize misclassification; yet some misclassification may be inevitable for logistical reasons

Diagnostic Tests and Screening—Slide 2 Definitive diagnosis/classification may be difficult or impossible to obtain. “Gold standard” may be expensive, inappropriate (e. g. autopsy based) or unsuitable (e. g. clinical follow-up when immediate decision required). Tests may serve as surrogates but this requires that they be appropriately validated against a suitable gold standard - and that their properties be documented.

Diagnostic Tests and Screening--Slide 3 We will focus largely on the situation where the diagnosis/outcome and the test result are both dichotomous, i. e. Disease: Present vs. absent Test: Positive vs. negative We need to know how well the test separates those who have the disease of interest from those who do not.

Diagnostic Tests and Screening-- Slide 4 We can use a 2 x 2 table to describe the various possibilities: Test + Test True positive rate Disease + True + False - Disease False + True - = P(T+ D+) = TP/(TP+FN) = Sensitivity: The probability that a diseased individual will be identified as such by the test

Diagnostic Tests and Screening-- Slide 5 Test + Test - Disease + True + False - Disease False + True - True negative rate = P(T- D-) = TN/(TN+FP) = Specificity: The probability that an individual without the disease will be identified as such by the test

Diagnostic Tests and Screening-- Slide 6 Complementary probabilities: False negative rate = FN/(TP+FN) = P(T- D+) = 1 -sensitivity False positive rate = FP/(TN+FP) = P(T+ D-) = 1 -specificity

Diagnostic Tests and Screening-- Slide 7 Example: A researcher develops a new saliva pregnancy test. She collects samples from 100 women known to be pregnant by blood test (the gold standard) and 100 women known not be pregnant, also based on the same blood test. The saliva test is “positive” in 95 of the pregnant women. It is also “positive” in 15 of the non-pregnant women. What are the sensitivity and specificity?

Diagnostic Tests and Screening-- Slide 8 Pregnant Saliva + Saliva Totals 95 5 100 Non-pregnant 15 85 100 Sensitivity = TP/(TP+FN) = 95/100 = 95% Specificity = TN/(TN+FP) = 85/100 = 85% Totals 110 90 200

Diagnostic Tests and Screening-- Slide 9 Is it more important that a test be sensitive or specific? • It depends on its purpose. A cheap mass screening test should be sensitive (few cases missed). A test designed to confirm the presence of disease should be specific (few cases wrongly diagnosed). • Note that sensitivity and specificity are two distinct properties. Where classification is based on an cutpoint along a continuum, there is a tradeoff between the two.

Diagnostic Tests and Screening-- Slide 10 Example: The saliva pregnancy test detects progesterone. A refined version is developed. Suppose you add a drop of indicator solution to the saliva sample. It can stay clear (0 reaction) or turn green (1+), red (2+), or black (3+). (For purposes of discussion we will ignore overlapping colors)

Diagnostic Tests and Screening-- Slide 11 The researcher conducts a validation study and finds the following: Pregnant Non-pregnant Totals Saliva 3+ Saliva 2+ Saliva 1+ Saliva 0 85 10 3 2 5 10 17 68 90 20 20 70 Totals 100 200

Diagnostic Tests and Screening-- Slide 12 The sensitivity and specificity of the saliva test will depend on the definition of “positive” and “negative” used. • If “positive” 1+, sensitivity = (85+10+3)/100 = 98% specificity = 68/100 = 68% • If “positive” 2+, sensitivity = (85+16)/100 = 95% specificity = (68+17)/100 = 85% • If “positive” = 3+, sensitivity = 85/100 = 85% specificity = (68+17+10)/100 = 95%

Diagnostic Tests and Screening-- Slide 13 The choice of cutpoint depends on the relative adverse consequences of false-negatives vs. false-positives. If it is most important not to miss anyone, use sensitivity and specificity. If it is most important that people not be erroneously labeled as having the condition, use sensitivity and specificity. A receiver operating characteristic (ROC) curve is often used to illustrate the use of different cutpoints for a test with continuous values.

Diagnostic Tests and Screening-- Slide 14 In practice, the clinician or researcher needs to know how to interpret test results without the simultaneous gold standard measurement. (If you already know the “gold standard” result, why would you obtain the other test? )

Diagnostic Tests and Screening – Slide 15 Hence we need to know: 1. How likely is a patient to have the condition of interest, given a “positive” test result? This is P(D+ T+), or the positive predictive value of the test [=TP/(TP+FP)] 2. How likely is a patient not to have the condition of interest, given a “negative” test result? This is P(D- T-), or the negative predictive value of the test [=TN/(TN+FN)]

Diagnostic Tests and Screening-- Slide 16 Key point: The positive and negative predictive values depend on the pretest probability of the condition of interest - in addition to the sensitivity and specificity of the test. This pretest probability is often the prevalence of the condition in the population of interest. But it can also reflect restriction of this population based on clinical features and/or other test results. For example, the pretest probability of pregnancy will be very different among young women using oral contraceptives from that among sexually active young women using no form of contraception.

Diagnostic Tests and Screening-- Slide 17 Example: The saliva pregnancy test is administered 30 days after the first day of the last menstrual period to two groups of women who have thus far “missed” a period. Group 1: 1000 sexually active young women using no contraception. Pretest probability of pregnancy 40% (hypothetical)

Diagnostic Tests and Screening-- Slide 18 Based on sensitivity of 95%, expected TP = 400 x 0. 95 = 380 expected FN = 400 -380 = 20 Based on specificity of 85%, expected TN = 600 x 0. 85 = 510 expected FP = 600 -510 = 90 Test + Test Totals Pregnant 380 20 400 Non-pregnant 90 510 600 Totals 470 530 1000

Diagnostic Tests and Screening-- Slide 19 Positive predictive value = TP = 380/470 = 81% TP+FP In this context, a woman with a positive saliva test has an 81% chance of being pregnant. Negative predictive value = TN = 510/530 = 96% TN+FN In this context, a woman with a negative saliva test has a 96% chance of not being pregnant (and a 4% chance of being pregnant)

Diagnostic Tests and Screening-- Slide 20 Group 2: 1000 oral contraceptive users - pretest probability of pregnancy = 10% (hypothetical) Test + Test Totals Pregnant 95 5 100 Using sensitivity = 95%, Using specificity = 85%, Non-pregnant 135 765 900 Totals 230 770 1000 expected TP = 0. 95 x 100 = 95 expected FN = 100 -95 = 5 expected TN = 0. 85 x 900 = 765 expected FP = 900 -765 = 135

Diagnostic Tests and Screening-- Slide 21 In this context, positive predictive value is only 95/230 = 41% [TP/(TP+FP)] Negative predictive value is [TN/(TN+FN)] = 765/770 = 99%

Diagnostic Tests and Screening-- Slide 22 In which situation is the saliva test more helpful? Group 1: Test +: 81% probability of pregnancy Pretest probability 40% Test -: 4% probability of pregnancy Group 2: Test +: 41% probability Pretest probability 10% Test -: 1% probability

Diagnostic Tests and Screening-- Slide 23 • Note that the same test would likely be used and interpreted very differently in these two contexts. • This does not imply any difference in the characteristics of the test itself, i. e. sensitivity and specificity are not altered by the pretest probability of the condition of interest. • Tests are most useful when the pretest probability is in a middle range. They are unlikely to be useful when the pretest probability is already very high or low.

Diagnostic Tests and Screening-- Slide 24 Deriving predictive values (post-test probabilities) using a 2 x 2 table: 1. Fill in totals with/without disease based on pretest probabilities. In general these depend on external information about the population of interest and cannot be extrapolated from a validation study. 2. Fill in the positives and false negatives using sensitivity. - TP = Number with disease x sensitivity - FN = Number with disease x (1 -sensitivity) 3. Fill in true negatives and false positives using specificity. - TN = Number free of disease x specificity - FP = Number free of disease x (1 -specificity) 4. Calculate PPV = TP/(TP+FP) Calculate NPV = TN/(TN+FN)

Diagnostic Tests and Screening-- Slide 25 Bayes’ theorem: Allows us to calculate revised (“posterior” or post-test) probabilities, based on “prior” (pretest) probabilities and new information (here, test results). General form: P(B A) = P(A B) x P(B) P[(A B) x P(B)] + [P(A B) x P(B)] Note that B corresponds to “Not B”, so P(B) = 1 - P(B)

Diagnostic Tests and Screening-- Slide 26 For positive predictive value, P (D+ T+) = P (T+ D+) x P(D+) [P(T+ D+) x P(D+)] + [P(T+ D-) x P(D-)] Note this is identical to TP TP+FP

Diagnostic Tests and Screening - Slide 27 For negative predictive value, P(D- T-) = P(T- D-) x P(D-) [P(T- D-) x P(D-)]+[P(T- D+)x. P(D+)] which is equal to TN TN+FN

Diagnostic Tests and Screening-- Slide 28 Example: What would be the positive and negative predictive values for the saliva pregnancy test if the pretest probability of pregnancy is 20%? (sensitivity = 95%, specificity = 85%) P(pregnant T+) = P(T+ pregnant) x P(pregnant) [P(T+ pregnant)x. P(pregnant)]+[P(T+ not pregnant)x. P(not pregnant)] = 0. 95 x 0. 2 = 0. 19 = 0. 61 or 61% (0. 95 x 0. 2)+(0. 15 x 0. 8) 0. 19+0. 12

Diagnostic Tests and Screening - Slide 29 P(not pregnant T-) = P(T- not pregnant)x. P(not pregnant) [P(T- not pregnant)x. P(not pregnant)]+ [P(T- pregnant)x. P(pregnant)] = 0. 85 x 0. 8 = 0. 68 = 0. 99 or 99% (0. 85 x 0. 8)+(0. 05 x 0. 2) 0. 68+0. 01

Diagnostic Tests and Screening - Slide 30 Likelihood Ratios • An alternative way of developing post-test probabilities (predictive values) • Relationship between pre- and post-test odds, where • Odds = [probability of x]/[1 -probability of x] – If pre-test probability of pregnancy is 20%, then odds of pregnancy = 0. 2/(1 -0. 2) = 0. 25 – Odds of no pregnancy = 0. 8/(1 -0. 8) = 4 [the reciprocal] • Probability = [odds of x]/[1+odds of x] – If prior odds of pregnancy = 0. 25, then pre-test probability of pregnancy = 0. 25/(1+0. 25) = 0. 2

Diagnostic Tests and Screening - Slide 31 Likelihood Ratios • Post-test odds = pre-test odds x likelihood ratio, where • Likelihood ratio = [P test result│condition of interest] [P test result│no condition of interest]

Diagnostic Tests and Screening - Slide 32 Likelihood Ratios • Pregnancy example, saliva test as before – Prior odds 0. 25 (20% pre-test probability) – Sensitivity 95%, specificity 85% • Post-test odds with positive test = 0. 25 x (0. 95/0. 15) = 0. 25 x 6. 33 = 1. 58 • Post-test probability = 1. 58/(1+1. 58) = 61% • This approach can be particularly useful for tests with multiple categories, and for serial testing

Diagnostic Tests and Screening-- Slide 33 Pitfalls in assessments of diagnostic test performance • Importance of pretest probability, as discussed. • Pretest probability (and predictive values) cannot ordinarily be extrapolated from a validation study, since the proportions with and without disease are determined by the investigator – unless there is truly random sampling that reflects the context in which the test will be applied.

Diagnostic Tests and Screening - Slide 34 Was the test applied in a consistent fashion to all members of the validation sample? e. g. was test interpretation properly blinded? (unrelated to “true” presence or absence of disease or clues to it) Was the gold standard applied in a consistent fashion to all members of the validation sample? (again, blinded application not related to results of test(s) being evaluated)

Diagnostic Tests and Screening-- Slide 35 Example: New diagnostic tests for pulmonary embolism “Positive” results confirmed by pulmonary angiography (an invasive test with some risk) “Negative” results confirmed by clinical follow-up, i. e. does the patient return with further symptoms or signs? - this condition can resolve spontaneously and not recur

Diagnostic Tests and Screening-- Slide 36 Result: Good documentation of true and false positives Overestimate true negatives, underestimate false negatives ® sensitivity of test overestimated specificity of test also overestimated

Diagnostic Tests and Screening-- Slide 37 Importance of the sample used for test validation: • What was the spectrum of the condition evaluated? • How similar is this to the situation in which the test will be used?

Diagnostic Tests and Screening-- Slide 38 Example: saliva pregnancy test Imagine that test hinges on ability to detect progesterone, a hormone where the level increases as pregnancy progresses • If the test is validated by comparing women who are 3 months pregnant with young, non-pregnant women, it will perform very well as progesterone levels are very high by 3 months.

Diagnostic Tests and Screening - Slide 39 • On the other hand, the sensitivity may be much lower if the pregnant group consists of women who are only 1 month after their last menstrual period. • Conversely, the estimated specificity of the test will be higher if the comparison group has very low progesterone levels (e. g. postmenopausal women).

Diagnostic Tests and Screening-- Slide 40 You would reject results of a validation study involving women who are 3 months pregnant, or women who are postmenopausal • By 3 months, pregnancy is usually relatively obvious by history and thus is unlikely to be the situation where the test will be used. • The test would never be administered to postmenopausal women!

Diagnostic Tests and Screening-- Slide 41 So: Sensitivity and specificity estimates do not depend on the prevalence of the condition in question. BUT their values and their validity depend on the context in which they were obtained, vis-a-vis the context in which they will be used. This in turn will affect positive and negative predictive values, quite apart from the prevalence/prior probability of the condition.

Diagnostic Tests and Screening - Slide 42 Misclassification The use of an imperfect diagnostic test leads to misclassification (assigning individuals to the wrong category). In research studies, it is most often non-differential. • That is, the probability of misclassification is not associated with the exposure or intervention under study.

Diagnostic Tests and Screening - Slide 43 • For example, the use of an imperfect cardiac enzyme assay to define myocardial infarction in a primary prevention study with a novel anti -platelet agent. • Another example: ascertaining the development of HIV infection based on a saliva test, comparing injection drug users who do vs. who do not clean their needles (in a cohort study).

Diagnostic Tests and Screening-- Slide 44 • The effect of nondifferential misclassification is to dilute any association which may be present, i. e. the effect measure is biased toward the null value. • Consider the extreme case where the cardiac enzyme assay is no better than flipping a coin. Then no effect of the antiplatelet drug will be detected, even if it is truly very beneficial. • If the degree of misclassification is known, then corrected 2 x 2 tables and parameter estimates can be derived.

Diagnostic Tests and Screening-- Slide 45 Differential misclassification implies that measurement error is associated with study group membership, i. e. it operates differentially between groups. Imagine that the antiplatelet drug directly interferes with the cardiac enzyme assay, leading to underestimation of enzyme levels. Here, the drug may appear to be protective even if in reality, it is no better than placebo. Hence depending on the specific circumstances, differential misclassification may lead to under- or overestimation of the true association between exposure and outcome.

Diagnostic Tests and Screening – Slide 46 Screening • “The identification of an unrecognized disease or risk factor by…[a] procedure that can be applied reasonably rapidly to asymptomatic people. ” (Fletcher, p. 149) • Screening is relevant only if disease is relatively common, testing is sensitive, specific, and cost-effective, and early treatment improves outcomes

Diagnostic Tests and Screening – Slide 47 Sensitivity may be calculated by • Detection method: Cases found by screening + those identified during followup of screened persons (interval cases) • Incidence method: Incidence among unscreened - interval incidence among screened Incidence among unscreened Incidence method accounts for “overdiagnosis” of abnormalities that are not clinically important, e. g. prostate cancer

Diagnostic Tests and Screening-- Slide 48 Biases in performance of screening tests (Does screening lead to better survival? ) 1. Lead time bias The earlier in its natural history an ultimately fatal disease is detected, the longer will be the survival from the time of diagnosis, even if there is no difference in treatment effect. e. g. 2 years 3 years Disease Detectable develops by screening 5 years Clinical symptoms Death If 2 persons A+B develop the same disease at the same age but person A is diagnosed by screening, person A will live 3 more years than person B from time of diagnosis, even if neither is treated, though the chronological survival is equivalent

Diagnostic Tests and Screening-- Slide 49 2. Length bias The probability of detecting a disease during its preclinical period is proportional to the length of that period, which is inversely proportional to the rate of disease progression. Hence cases diagnosed by screening may be “destined” for a more favourable evolution, regardless of treatment.

Diagnostic Tests and Screening-- Slide 50 3. Overdiagnosis bias (a variant of length bias) Screening may detect disease that would never have become clinically detectable, e. g. that remains stable and asymptomatic, or regresses spontaneously. It may also detect disease that would not have contributed to the patient’s death e. g. competing mortality risks among smokers with early-stage lung cancer, or men with early-stage prostate cancer detected by PSA screening.

Diagnostic Tests and Screening-- Slide 51 4. Compliance bias • Persons who comply with a screening intervention may be healthier—on average--and have healthier behaviours than non-compliers. Also likely to be healthier than an unscreened “control group, ” which implicitly includes a mixture of persons who would and would not have complied, had they been offered screening. Leads to biases in observational (non-randomized) studies, and with analyses limited to “compliers” within randomized trials. Relevance of “intent to screen” analyses. • • •