Diagnosis with continuous and multiple predictors ROC curves

HW: Screening kids for cholesterol and “treating” those that are high • • Treatment:

Setting Treatment cutoffs Let A(x) = E(b(c) | c > x) Discounted Years saved

Key Points • • • Odds make Bayes Revision easy ROC curves can help

Odds and Probabilities å The odds of an event is the ratio of the

Odds and Probability - 2 Odds of E = x: y <--> p(E) =

Likelihood ratio and Bayes • • • The result specific likelihood ratio for any

Proof by example of odds revision Disease test + . 8 test - .

ROC curves • • Deals with where to draw line between normal and abnormal

More refined test results Test Result Diseased Disease in Patient Cumulated App No App.

Where to draw the line for disease? Disease in Patient Cumulated Test App NSAP

Bayes with combined test results Test Performance on Known Cases Disease No Disease ?

Bayes with combined test results (odds version) Test Performance on Known Cases Disease No

But combined rules may not be best (odds version) Test Performance on Known Cases

But combined rules may not be best Test Performance on Known Cases Disease No

Problems with using ROC Curves • • To use tests, ROC is not needed

Multiple “tests” and logistic regression If we want to predict Y a 0, 1

Odds Ratios in logistic regression • What do the coefficients bi mean? change in

Logit Coefficients and ORs • The logit coefficient for a factor X is the

A Diagnostic Index for CVD • Uses Framingham Study starting in 22 nd year

Methods • • • Organized data on 8491 eligible patients Got the usual suspects

Modifications for clinical tool • • Lab and non-lab versions of variables å Make

Such Indices often used to calculate value of risk factor reduction • • •

Advanced topics I did not cover • • • Area under the ROC curve

Slides: 24

Download presentation

HW: Screening kids for cholesterol and “treating” those that are high • • Treatment: Give parents diet and exercise tips Given cost of screening Cs, of treatment Ct å treatment costs are mainly costs of instruction Distribution of log cholesterol values (c), benefits of treatment b(c) How to spend budget for this program? å å If there are not many kids in jurisdiction If there are many more kids that could benefit than can be treated RAND 2 © 2008 Emmett Keeler

Setting Treatment cutoffs Let A(x) = E(b(c) | c > x) Discounted Years saved by treatment b(x) Let Strategy 1 be treat one more Strategy II be screen more, treat if above x Let (x) be cdf of X = P(c < x) contains (x) X contains 1 - (x) of kids

Key Points • • • Odds make Bayes Revision easy ROC curves can help find the threshold for a many-valued test result to be considered positive, but why bother? å Just use likelihood ratio of observed result Can combine multiple tests into predictions using regression, and develop a useful index RAND 4 © 2008 Emmett Keeler

Odds and Probabilities å The odds of an event is the ratio of the probability of an event happening to the probability of the event not happening: • Odds = Pr(Y=1)/Pr(Y=0) = P/(1 -P) “ 3 to 1 in favor” = odds of 3 = probability of 0. 75 “ 2 to 1 in favor” = odds of 2 = probability of 0. 67 “ 1 to 1” = odds of 1 = probability of 0. 50 “ 2 to 1 against” =odds of 1/2=probability of 0. 33 “ 3 to 1 against” = odds of 1/3=probability of 0. 25 See Hunink page 145 RAND 5 © 2008 Emmett Keeler

Odds and Probability - 2 Odds of E = x: y <--> p(E) = x/(x+y) p(E) =p <--> Odds of E = p / (1 -p) Odds are not unique-- 1: 2 is same as 2: 4 • As the probability of an event approaches 1, the odds approach infinity RAND 6 © 2008 Emmett Keeler

Likelihood ratio and Bayes • • • The result specific likelihood ratio for any result is the probability of that result conditional on disease / probability conditional on no disease. LR (Disease |Test +) = P(T+ | D)/ P(T+ | no D) = sensitivity/1 -specificity Theorem: Posterior odds = prior odds x LR( observed result) å Proof by example to follow RAND 7 © 2008 Emmett Keeler

Proof by example of odds revision Disease test + . 8 test - . 2 No Disease . 2. 8 conditional probabilities • • • Disease No Disease prior odds LR of test posterior odds convert prior probability to odds, e. g. 20% --> 20/80 or 1/4 multiply prior odds by LR of test result convert posterior odds to probability

More refined test results Test Result Diseased Disease in Patient Cumulated App No App. 3 . 05 A Probable . 4 . 15 . 7 . 2 B ? ? ? . 2 . 9 . 4 Prob not . 1 . 3 1 . 7 NSAP 0 . 3 1 1 1. 0

Where to draw the line for disease? Disease in Patient Cumulated Test App NSAP Disease . 3 . 05 A Probable . 7 ROC curve True Pos. B . 2 treat lots B ? ? ? . 9 . 4 C Prob not 1 A . 7 D NSAP 1 1 1 - True Neg.

Bayes with combined test results Test Performance on Known Cases Disease No Disease ? ? ? or worse . 3 . 8 These are the conditional probabilities P(T obs. |D) Current Patient Disease probable prior patients No Disease 15 760 50 950 1. Split 1000 prior people on bottom of 2 nd box 2. Compute people in each after test cell 3. Compute “posterior” P(Disease|Test result) = 15 / (15 +. 8 X 950) = 15 /(15 + 760) = 0. 019 <. 024 so we don’t operate, so test has helped.

Bayes with combined test results (odds version) Test Performance on Known Cases Disease No Disease ? ? ? or worse . 3 . 8 These are the conditional probabilities P(T obs. |D) Current Patient Disease No Disease prior odds 1 19 LR 3 8 3 152 post odds 1. p =. 05 so odds = p/1 -p = 1/19 2. Compute posterior odds 3/155 3. convert to prob. 3/(3+152) = 0. 019 <. 024 so we don’t operate, so test has helped

But combined rules may not be best (odds version) Test Performance on Known Cases Disease No Disease ? ? ? . 2 Current Patient Disease No Disease . 2 These are the conditional probabilities P(T obs. |D) Odds of test result ? ? ? are 1 to 1, so post test prob = prior prob =. 05 which remains >. 024 Only the observed test result is relevant!

But combined rules may not be best Test Performance on Known Cases Disease No Disease ? ? ? . 2 These are the conditional probabilities P(T obs. |D) Current Patient Disease No Disease probable prior patients 50 950 Only the observed test result is relevant!

Problems with using ROC Curves • • To use tests, ROC is not needed å Use Likelihood Ratio of observed result instead Often, the dependent variable is not 0, 1 å å å Hypertension, expensive next year. . . Each level has a different ROC curve Regression on test results more informative. • easy to include multiple tests • test performance given by standard statistics RAND 16 © 2008 Emmett Keeler

Multiple “tests” and logistic regression If we want to predict Y a 0, 1 variable from many “tests” (characteristics). E. g. , Y= who will benefit from cataract surgery, based on age, self-reported visual functioning, a clinical history and exam. (Mangione developed a CSI this way) Fit log(pj/(1 - pj)) = ∑i xijbij , where p is E(Y) and xij is the value of person j on test i. Then each person has an index Ij = ∑ xibi. The index can be used in a decision rule. For each T, we can calculate the LR P(I =T|Y=1)/ P(I =T | Y=0), and use it for decisions. RAND 17 © 2008 Emmett Keeler

Odds Ratios in logistic regression • What do the coefficients bi mean? change in E(log(p/1 -p)) as Xi goes from 0 to 1 å log(odds if X = 1) - log (odds if X = 0) For a dichotomous Xi, let P 1 and P 0 be Pr(Y=1|Xi =1) and Pr(Y=1|Xi =0). The odds ratio (OR) of an event (here Y=1) for two groups (here split by Xi) is defined as å [Odds in group 1]/[Odds in group 0] = å [P 1/(1 -P 1)]/[P 0/(1 -P 0)] å • • So bi = Log(OR) Warning: The OR ≠ risk ratio (RR) = P 1/P 0 RAND 18 © 2008 Emmett Keeler

Logit Coefficients and ORs • The logit coefficient for a factor X is the natural logarithm of the OR å å Reversing the coding of Y( e. g making death =1 instead of survive =1) changes bi to -bi X has no association with P(Y=1) <-> ßx = 0 RAND 19 © 2008 Emmett Keeler

A Diagnostic Index for CVD • Uses Framingham Study starting in 22 nd year when HDL was first measured. å • • å mother of all health panels. gave first estimates of risk of HBP, cholesterol … D’Agostino paper goal is to develop predictor of CVD to help clinicians manage and motivate patients without CVD to improve risk factors. This 2008 paper is one of a long string of such studies, starting with Cornfield, 1962. RAND 20 © 2008 Emmett Keeler

Methods • • • Organized data on 8491 eligible patients Got the usual suspects for risk factors Men and women done separately Some preliminary runs to get to final list å Cox regression: similar to logistic but takes time to event into account. å • dropped family history, obesity, ECG LVH … interpretation based on survival at the mean of all risk factors x Cox index adjustments. Note risk factors entered as logs in regression å so what does a 1% change in X do to log odds? RAND 21 © 2008 Emmett Keeler

Modifications for clinical tool • • Lab and non-lab versions of variables å Make integer scale and tables å å • • • BMI substitutes for cholesterol pick a small factor as the unit divide other coefficients by its coefficient and round. Translate integer scale into 10 year risk in %. Use “Heart Age” to interpret that result Sometimes we calibrate to different population by regressing their events on the FRS. RAND 22 © 2008 Emmett Keeler

Such Indices often used to calculate value of risk factor reduction • • • http: //www. thehealthierpeoplenetwork. org/id 4. h tml Typically collect data from patient, stick it in program, give feedback on current risk, and on results of lifestyle modification These functions might be built into EMRs RAND 23 © 2008 Emmett Keeler

Advanced topics I did not cover • • • Area under the ROC curve = c-statistic Formal derivation of where to draw the line for diseased for a differentiable risk factor. Both are discussed in Hunink. Next time we will look at a CEA analysis of diagnostic tools in the developing world. RAND 24 © 2008 Emmett Keeler