Diagnosis Articles Much Thanks to Rob Hayward Tanya

Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE

Outline • Philosophy of Diagnosis: – Probability of disease – Test and treatment thresholds • ANALYZING STUDIES • Validity: – Gold (reference) standard • Numbers: – Sensitivity, Specificity, Likelihood ratio • Applicability: – Observer agreement, Kappa

Philosophy of Diagnosis? • Pre-test Probability – The probability that a disease is present before doing a test. – A clinical best guess • Post-test Probability – The probability that a disease is present after doing a test – a combination of clinical best guess & test result.

Philosophy of Diagnosis? • When Tests are good: Target Negative (Normal) Very Normal B A Test results Target Positive (Severely ill) Very Abnormal

Philosophy of Diagnosis? • When Tests aren’t so good: Target Positive Target Negative 4 1 Very Normal Very Abnormal Test result (LR = 1) Test result (LR = 4)

EBM TP: Diagnostic Tests • How good are: – Phalen’s Test, – Shifting Dullness, – Patient Report of Fever, – Interstitial Edema on C-Xray, – Ottawa Ankle Rules – Canadian C-Spine Rules vs NEXUS.

Users Guides: Diagnosis

Are the results valid? • Did clinicians face diagnostic uncertainty? – Were subjects drawn from a common group in which it is not known whether the condition of interest is present or absent? – E. g First CEA studies used known bowel cancer patients 1 1. Proc Natl Acad Sci USA 1969; 64: 161 -7

Are the Results Valid Was an acceptable gold standard used? • Imagine a study investigating WBC for Appendicitis that use U/S for the gold standard?

Are the results valid? • The test being studied and the gold standard should be completely separate. Studied

Are the results valid? • The test being studied and the gold standard should be completely separate? 1) Were the test and gold standard independent? • A study looking at Serum Amylase for Pancreatitis that used a gold standard made of a combination of tests including serum amylase. 1 2) Were the test & gold standard results assessed blindly? • Imagine a study investigating Ottawa Ankle Rules, in which the radiologist was told the results of the Ankle rules before reading the films. 1. NEJM 1997; 336: 1788 -93

Are the results valid? • Did test being studied effect if gold standard was done? – Was a different gold standard applied to subjects testing negative? – E. g. When evaluating VQ scans for PE, those with normal scans often did not go on the gold standard (pulmonary angiography). 1 – In these cases (frequent) we need to be assured of a reasonable back-up gold standard. 1 JAMA 1990; 263: 2753 -59.

Users Guides: Diagnosis

EBM Tool for Diagnostic Tests Should: • Tell if a symptom, sign or test is useful • Useful in which way: – Screening (Ruling out) – Making a Diagnosis (Ruling in) • Help us determine the probability of a disease

EBM Diagnostic test Standards • Sensitivity • SNOUT – Sensitive tests if Negative rule OUT disease. • Specificity • SPIN – Specific tests if Positive rule IN disease • Helpful to sort out if a test is good for Screening (Ruling out) or Diagnosis (Ruling in)

LR Advantage • LR’s – Take into account all elements (false positives/negatives and true positives/negatives) – Have Criteria for Usefulness of each Test. – Can be used over a Range of Test Results (e. g. WBC) – Can calculate the actual Likelihood of a disease

Key Concept • Likelihood Ratio: Determine the usefulness of tests. • (Positive) Likelihood Ratios >1 : • ↑ Likelihood Ratio (1 - ∞) = ↑ likelihood of disease • Make the diagnosis (Rule in disease) • (Negative) Likelihood Ratio <1: • ↓ Likelihood Ratio (1 – 0) = ↓ likelihood of disease • Exclude the diagnosis (Rule out disease)

What does the LR mean? (Criteria for Usefulness) LR Increase probability Decrease probability Excellent > 10 < 0. 1 Good 5 -10 0. 2 -0. 1 Moderate/Small 2 -5 0. 2 -0. 5 Poor 1 -2 0. 5 - 1

How do I use the LR? Nomogram LR calculator

What are the results? • What range of likelihood ratios were associated with the range of possible test results? – Ferritin to detect Fe deficiency (GS = bone marrow) Serum Ferritin Iron Deficient Patients Not Iron Deficient Positive (< 45) 70 15 Negative (>45) 15 135 Sensitivity = 82% Specificity = 90% LR + = 8. 2 LR - = 0. 2

What are the results? • What range of likelihood ratios were associated with the range of possible test results? – Ferritin to detect Fe deficiency (GS = bone marrow) Serum Ferritin Iron Deficient Patients Not Iron Deficient < 18 47 2 19 – 45 23 13 46 – 100 7 27 > 100 8 108 Total patients 85 150

What are the results? • What range of likelihood ratios were associated with the range of possible test results? – Ferritin to detect Fe deficiency (GS = bone marrow) Serum Ferritin Iron Deficient Patients L 1 Not Iron Deficient L 2 LR = L 1/L 2 < 18 47 47/85= 0. 553 2 2/150= 0. 013 42. 5 19 – 45 23 23/85= 0. 271 13 13/150= 0. 086 3. 15 46 – 100 7 7/85= 0. 082 27 27/150= 0. 180 0. 46 > 100 8 8/85= 0. 094 108/150= 0. 720 0. 13 Total patients 85 150

Applying LR: Examples • A 30 y. o. woman complaining of fatigue and vague MDD Sx (Normal periods). – Guess 20% anemia before test. – Ferritin = 12, (LR = 42. 5) • Anemia = 90% • Same woman, – Ferritin =108, (LR = 0. 13) • Anemia = 2%

LR Examples • Phalen Test (Carpal Tunnel): • LR= 1. 3 • Shifting Dullness (Ascites): • LR= 2. 3 • Patient Reporting Fever (>38 Temp): • LR = 4. 9 • Interstitial Edema on Chest X-Ray (CHF): • LR= 12. 7 • Ottawa Ankle Rules (Ankle #): • -ve LR = 0. 08 • Canadian C-Spine Rules (C-spine #): • -ve LR= 0. 013. (vs NEXUS –ve LR = 0. 25) JAMA 2000; 283: 3110 -7. J Gen Intern Med 1988: 423 -8. Ann Emerg Med 1996: 27: 693 -5. Am J Med 2004; 116: 363 -8. BMJ 2003; 326: 417. NEJM 2003; 349: 2510 -8.

Math Diagnostic Tests: Summary • Likelihood Ratios are the best we have • Tell if a symptom, sign or test is useful • Help us determine the probability of a diagnosis

Users Guides: Diagnosis

Apply to patient care? • Is the test and its interpretation reproducible (Kappa)? • Is the test result the same when reapplied by the same observer (intra-observer variability)? • Do different observers agree about the test result (inter-observer variability)? • Examples – Specialist doing JVP = 0. 42, – Specialist assessing DM retinopathy from photograph = 0. 55 – Interpreting mammogram = 0. 67 Greenhalgh T. How to Read a Paper (The basics of evidence based medicine). 2001

Apply to patient care? • Are the results applicable to the patient in my practice? -Are the patients in the study like mine.

Apply to patient care? • Will the results change my management strategy? – Are the test LRs high or low enough to shift post-test probability across a test or treatment threshold?

Apply to patient care? • Will patients be better off as a result of the test? – Will the anticipated changes do more good than harm? – Effect of clinically insignificant disease

Summary • Key concepts: Reference Standard – You cannot decide if a test works unless you have a “gold standard”. Likelihood Ratio – To determined the utility of a test, Find how much a given result will shift the Likelihood of a Diagnosis. Who cares? – Think about the “ignore” and “act” thresholds and if the test moves you from uncertainty into either zone.

The End Much Thanks to: Rob Hayward & Tanya Voth, CCHE