TESTING A TEST Ian Mc Dowell Department of

A Lab Report (Montfort Hospital Biochem Lab) 2

The Challenge of Clinical Measurement • Diagnoses are based on information, from formal measurements

Therefore… • You need to be aware … – That diagnosis is a matter

Attributes of Tests or Measures • Cost, Safety, Acceptability, etc. • Reliability: reproducibility; this

Reliability and Validity Reliability Low Validity Low High • • • • High •

Ways of Assessing Validity • Face, Content validity: does it make clinical or biological

“Gold Standards” Sensitivity and specificity are judged against • More definitive (but expensive or

2 x 2 Table for Testing a Test Positive test Negative test Validity: Gold

A Bit More on Sensitivity = Ability to detect disease when it is present

…and More on Specificity Ability to detect absence of disease when it is truly

Clinical applications • A specific test can be useful to rule in a disease.

The Selection of a Cutting Point Well population Sick population Healthy scores Pathological Move

Problems with Wrong Results • False Positives can arise due to other factors (such

The Crucial Point: Predictive Values • Sensitivity & specificity are characteristics of the test

Predictive Values • Based on rows, not columns • PPV = a/(a+b); interprets positive

Same Test, Two Clinical Situations A. Referral hospital: Prevalence = 55/165 = 33% D+

Practical Question: “Doctor, what’s my likelihood of having the disease? ” To answer this

Prevalence of Disease • We have seen how this influences the interpretation of a

Estimating predictive values for a specific setting is called ‘calibrating’ the test You could:

Calibration by hypothetical table Fill cells in following order: “Truth” Disease Present Absent Test

Combining Sensitivity and Specificity: Receiver Operating Characteristic Curves Work out Sen and Spec at

Likelihood Ratios • Defined as the odds that a given level of a diagnostic

Calibration with a Nomogram 1) You need the LR. 2) Select pretest probability (prevalence)

Chaining LRs Together • Example: 45 year-old woman with 1 -month history of intermittent

The previous example: 1. From the History: She’s young; pretest probability about 1% Pretest

Chaining LRs Together • 45 year-old woman with 1 -month history of intermittent chest

The previous example: ECG Results Post-test probability now rises to 90% Now start pretest

Slides: 28

Download presentation

TESTING A TEST Ian Mc. Dowell Department of Epidemiology & Community Medicine November, 2004 1

A Lab Report (Montfort Hospital Biochem Lab) 2

The Challenge of Clinical Measurement • Diagnoses are based on information, from formal measurements or from your clinical judgment • This information is seldom perfectly accurate: – Random errors can occur – Biases in judgment or measurement can occur – Due to biological variability, this patient may not fit the general rule – Diagnosis (e. g. , hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point may be arbitrary 3

Therefore… • You need to be aware … – That diagnosis is a matter of probabilities – That using a quantitative approach is better than just guessing! – That you will ultimately become familiar with the typical accuracy of measurements in your chosen clinical field – Of some of the ways to describe the accuracy of a measurement – That the principles apply to both diagnostic and screening tests 4

Attributes of Tests or Measures • Cost, Safety, Acceptability, etc. • Reliability: reproducibility; this considers chance or random errors • Validity: Does it measure what it is supposed to measure? By extension, what diagnostic conclusion can I draw from a particular score on the test? Validity may be affected by bias, or systematic errors 5

Reliability and Validity Reliability Low Validity Low High • • • • High • • • • • • • 6

Ways of Assessing Validity • Face, Content validity: does it make clinical or biological sense? Does it include the relevant symptoms? • Criterion: comparison to a “gold standard” definitive measure – Expressed as sensitivity and specificity • Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard) 7

“Gold Standards” Sensitivity and specificity are judged against • More definitive (but expensive or invasive) tests, such as a complete work-up, Or against • Eventual outcome (for screening tests, when workup of well patients is unethical) 8

2 x 2 Table for Testing a Test Positive test Negative test Validity: Gold standard Disease Present Absent a (TP) b (FP) c (FN) d (TN) Sensitivity Specificity = a/(a+c) = d/(b+d) 9 TP = true positive; FP = false positive…

A Bit More on Sensitivity = Ability to detect disease when it is present • a/(a+c) = TP/(TP+FN) • Mnemonics: a sensitive person is one who can detect your feelings (1 – se. Nsitivity) = false Negative rate (i. e. , How many cases are missed by the screening test? ) • Cf. power of statistical test (1 - ) 10

…and More on Specificity Ability to detect absence of disease when it is truly absent (can it detect non-disease? ) • d/(b+d) = TN/(FP+TN) • Mnemonics: – a specific test would identify only that type of disease. “Nothing else looks like this” – (1 - s. Pecificity) = false Positive rate (How many are falsely classified as having the disease? ) 11

Clinical applications • A specific test can be useful to rule in a disease. If the result on a specific test is positive, you can be sure the patient has the condition: “Sp. Pin” • A sensitive test can be useful for ruling a disease out. A negative result on a very sensitive test reassures you that the patient does not have the disease: (“Sn. Nout”) 12

The Selection of a Cutting Point Well population Sick population Healthy scores Pathological Move this way scores to increase sensitivity specificity Crucial issue: changing cut-point can improve 13 sensitivity or specificity, but at expense of the other

Problems with Wrong Results • False Positives can arise due to other factors (such as taking other medications, diet, etc. ) They entail cost and danger of investigations, labeling, worry – This is similar to Type I or alpha error in a test of statistical significance: the possibility of falsely concluding that there is an effect of an intervention. • False Negatives imply missed cases, so potentially bad outcomes if untreated – cf Type II or beta error: the chance of missing a true difference 14

The Crucial Point: Predictive Values • Sensitivity & specificity are characteristics of the test • But the clinician, of course, gets the test result and do not know if this person is a true positive or a false positive (or a true or false negative). Hmmm… • How do we assess the predictive value of a positive or negative result? 15

Predictive Values • Based on rows, not columns • PPV = a/(a+b); interprets positive test T+ T- D+ D- a c b d • NPV = d/(c+d); interprets negative test • Immediately useful to clinician: they tell us about the population and thus the patient • Depend upon prevalence of disease, so must be determined for each clinical setting • As prevalence goes down, PPV goes down and NPV rises 16

Same Test, Two Clinical Situations A. Referral hospital: Prevalence = 55/165 = 33% D+ D- T+ 50 10 T- 5 100 Sensitivity = 50/55 = 91% Specificity = 100/110 = 91% PPV = 50/60 = 83% NPV = 100/105 = 95% B. Primary Care: Prevalence = 55/1155 = 3% D+ D- T+ 50 100 T- 5 1000 Sensitivity = 50/55 = 91% Specificity = 1000/1100 = 91% PPV = 50/150 = 33% NPV = 1000/1005 = 99. 5% 17

Practical Question: “Doctor, what’s my likelihood of having the disease? ” To answer this question • You need to have a general idea of the sensitivity & specificity of the test • To interpret the results, you also need to know roughly the prevalence of the condition in your practice. You can then work out the PPV and answer the patient’s question. “Give me a break, dude … Surely there is an easier way to bring all this together? ” 18

Prevalence of Disease • We have seen how this influences the interpretation of a test score • Before you do the test, prevalence gives your best guess about the probability that the patient has the disease • Also known as Pretest Probability of Disease: (a+c) / N in 2 x 2 table a b • Or, can be expressed as odds of c d disease: (a+c) / (b+d) N 19

Estimating predictive values for a specific setting is called ‘calibrating’ the test You could: – Apply a the test and a definitive test to a consecutive series of patients (rarely feasible) – Calculate from Bayes’s Theorem (ouch!) – Draw a hypothetical table (maybe? ) – Use a nomogram (tell me how) 20

Calibration by hypothetical table Fill cells in following order: “Truth” Disease Present Absent Test Pos 4 th 7 th 5 th 6 th Test Neg Total 2 nd 3 rd (from prevalence) (from sensitivity) (from specificity) Total 8 th 9 th PV 10 th 11 th 1 st 21

Combining Sensitivity and Specificity: Receiver Operating Characteristic Curves Work out Sen and Spec at every possible cut-point, then plot these. Area under the curve indicates the information provided by the test 1 Sensitivity 0. 8 0. 6 0. 4 0. 2 0 0 0. 2 0. 4 0. 6 0. 8 1 -Specificity (= false positives) 1 Note: theme of sensitivity & (1 -specificity) will appear again! 22

Likelihood Ratios • Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positives / false positives. • Advantages: – Express sensitivity and specificity in one number – Can be calculated for many levels of the test – Can be turned into predictive values • LR for positive test = Sensitivity / (1 -Specificity) • LR for negative test = (1 -Sensitivity) / Specificity 23

Calibration with a Nomogram 1) You need the LR. 2) Select pretest probability (prevalence) on left axis 3) Select likelihood ratio on center axis 4) Draw line through right axis to indicate posttest probability of disease Example: Prevalence = 30% LR+ = 20; Post-test probability = 91% 24

Chaining LRs Together • Example: 45 year-old woman with 1 -month history of intermittent chest pain. – Pretest probability about 1% for CAD – History suggestive of angina (substernal pain; radiating down arm; induced by effort; relieved by rest…). • LR of this history for angina is about 100 25

The previous example: 1. From the History: She’s young; pretest probability about 1% Pretest probability rises to 50% based on history 26

Chaining LRs Together • 45 year-old woman with 1 -month history of intermittent chest pain… After the history, post test probability is now about 50%. What will you do? Record an ECG – Results = 2. 2 mm ST-segment depression. LR for ECG 2. 2 mm = 10. – Overall post test probability is now >90% for coronary artery disease (see next slide) 27

The previous example: ECG Results Post-test probability now rises to 90% Now start pretest probability (i. e. prior to ECG) at 50%, based on history: 28