Lecture 3 Validity of screening and diagnostic tests

  • Slides: 28
Download presentation
Lecture 3 Validity of screening and diagnostic tests • Reliability: kappa coefficient • Criterion

Lecture 3 Validity of screening and diagnostic tests • Reliability: kappa coefficient • Criterion validity: – “Gold” or criterion/reference standard – Sensitivity, specificity, predictive value – Relationship to prevalence – Likelihood ratio – ROC curve – Diagnostic odds ratio 1

Clinical/public health applications • screening: – for asymptomatic disease (e. g. , Pap test,

Clinical/public health applications • screening: – for asymptomatic disease (e. g. , Pap test, mammography) • for risk (e. g. , family history of breast cancer • case-finding: testing of patients for diseases unrelated to their complaint • diagnostic: to help make diagnosis in symptomatic disease or to follow-up on screening test 2

Evaluation of screening and diagnostic tests • Performance characteristics – test alone • Effectiveness

Evaluation of screening and diagnostic tests • Performance characteristics – test alone • Effectiveness (on outcomes of disease): – test + intervention 3

Criteria for test selection • • • Reliability Validity Feasibility Simplicity Cost Acceptability 4

Criteria for test selection • • • Reliability Validity Feasibility Simplicity Cost Acceptability 4

Measures of inter- and intra-rater reliability: categorical data • Percent agreement – limitation: value

Measures of inter- and intra-rater reliability: categorical data • Percent agreement – limitation: value is affected by prevalence higher if very low or very high prevalence • Kappa statistic – takes chance agreement into account – defined as fraction of observed agreement not due to chance 5

Kappa statistic Kappa = p(obs) - p(exp) 1 - p(exp) p(obs): proportion of observed

Kappa statistic Kappa = p(obs) - p(exp) 1 - p(exp) p(obs): proportion of observed agreement p(exp): proportion of agreement expected by chance 6

7

7

Interpretation of kappa • Various suggested interpretations • Example: Lanis & Koch, Fleiss excellent:

Interpretation of kappa • Various suggested interpretations • Example: Lanis & Koch, Fleiss excellent: over 0. 75 fair to good: 0. 40 - 0. 75 poor: less than 0. 40 8

Validity (accuracy) of screening/diagnostic tests • Face validity, content validity: judgement of the appropriateness

Validity (accuracy) of screening/diagnostic tests • Face validity, content validity: judgement of the appropriateness of content of measurement • Criterion validity – concurrent – predictive 9

Normal vs abnormal • Statistical definition – “Gaussian” or “normal” distribution • Clinical definition

Normal vs abnormal • Statistical definition – “Gaussian” or “normal” distribution • Clinical definition – using criterion 10

11

11

12

12

13

13

14

14

Selection of criterion (“gold” or criterion standard) • Concurrent – salivary screening test for

Selection of criterion (“gold” or criterion standard) • Concurrent – salivary screening test for HIV – history of cough more than 2 weeks (for TB) • Predictive – APACHE (acute physiology and chronic disease evaluation) instrument for ICU patients – blood lipid level – maternal height 15

Sensitivity and specificity Assess correct classification of: • People with the disease (sensitivity) •

Sensitivity and specificity Assess correct classification of: • People with the disease (sensitivity) • People without the disease (specificity) 16

17

17

Predictive value • More relevant to clinicians and patients • Affected by prevalence 18

Predictive value • More relevant to clinicians and patients • Affected by prevalence 18

Choice of cut-point If higher score increases probability of disease • Lower cut-point: –

Choice of cut-point If higher score increases probability of disease • Lower cut-point: – increases sensitivity, reduces specificity • Higher cut-point: – reduces sensitivity, increases specificity 19

Considerations in selection of cut -point Implications of false positive results • burden on

Considerations in selection of cut -point Implications of false positive results • burden on follow-up services • labelling effect Implications of false negative results • Failure to intervene 20

Receiver operating characteristic (ROC) curve • Evaluates test over range of cut-points • Plot

Receiver operating characteristic (ROC) curve • Evaluates test over range of cut-points • Plot of sensitivity against 1 -specificity • Area under curve (AUC) summarizes performance: – AUC of 0. 5 = no better than chance 21

22

22

Likelihood ratio • Likelihood ratio (LR) = sensitivity 1 -specificity • Used to compute

Likelihood ratio • Likelihood ratio (LR) = sensitivity 1 -specificity • Used to compute post-test odds of disease from pre-test odds: post-test odds = pre-test odds x LR • pre-test odds derived from prevalence • post-test odds can be converted to predictive value of positive test 23

Example of LR • • prevalence of disease in a population is 25% sensitivity

Example of LR • • prevalence of disease in a population is 25% sensitivity is 80% specificity is 90%, pre-test odds = 0. 25 = 1/3 1 - 0. 25 • likelihood ratio = 0. 80 = 8 1 -0. 90 24

Example of LR (cont) • If prevalence of disease in a population is 25%

Example of LR (cont) • If prevalence of disease in a population is 25% • pre-test odds = 0. 25 = 1/3 1 - 0. 25 • post-test odds = 1/3 x 8 = 8/3 • predictive value of positive result = 8/3+8 = 8/11 = 73% 25

Diagnostic odds ratio • Ratio of odds of positive test in diseased vs odds

Diagnostic odds ratio • Ratio of odds of positive test in diseased vs odds of negative test in non-diseased: a. d b. c • From previous example: OR = 8 x 27 = 36 2 x 3 26

Summary: LR and DPR • Values: – 1 indicates that test performs no better

Summary: LR and DPR • Values: – 1 indicates that test performs no better than chance – >1 indicates better than chance – <1 indicates worse than chance • Relationship to prevalence? 27

Applications of LR and DOR • Likelihood ratio: Primarily in clinical context, when interest

Applications of LR and DOR • Likelihood ratio: Primarily in clinical context, when interest is in how much the likelihood of disease is increased by use of a particular test • Diagnostic odds ratio Primarily in research, when interest is in factors that are associated with test performance (e. g. , using logistic regression) 28