Reliability and Validity Designs 1 2006 Accurate and
Reliability and Validity Designs 1 © 2006
Accurate and consistent measures are needed • It is very important in research and clinical practice to be able to measure patient characteristics accurately and consistently • Needed in clinical trials to effectively assess differences between groups • Needed in practice to help make clinical decisions and to track patients’ progress Evidence-based Chiropractic 2 © 2006
Reliability • The ability of a test to provide consistent results when repeated – By the same examiner – Or by more than one examiner testing the same attribute on the same group of subjects • Specific research designs are utilized to determine the degree tests are reliable Evidence-based Chiropractic 3 © 2006
Validity • The degree to which a test truly measures what it was intended it to measure • In valid tests, when the characteristic being measured changes, corresponding changes occur in the test measurement • In contrast, tests with reduced validity do not reflect patient changes very well Evidence-based Chiropractic 4 © 2006
Measurement error • All measurements have some degree of error • Thus, any given test score will consist of a true score plus an error component Observed score = True score + Error • True score is a theoretical concept involving a measurement derived from a perfect instrument in an ideal environment Evidence-based Chiropractic 5 © 2006
True score theory • In a group of subjects, variation of true scores occurs because of – Individual differences of the subjects – Plus an error component • Consequently, group scores will always be variable and the variability will result in a distribution of true scores plus error that conforms to a normal curve when the sample size is large enough Evidence-based Chiropractic 6 © 2006
Random errors • Errors that are attributable to the examiner, the subject, or the measuring instrument • Have little effect on the group’s mean score because the errors are just as likely to be high as they are low • For example, blood pressure which is variable depending on a number of factors Evidence-based Chiropractic 7 © 2006
Systematic errors • Errors that cause scores to move in only one direction in response to a factor that has a constant effect on the measurement system • Considered to be a form of bias • For example, a sphygmomanometer that is out of calibration and always generates high BP readings Evidence-based Chiropractic 8 © 2006
Error components Evidence-based Chiropractic 9 © 2006
Estimating reliability • The proportion of true score variance divided by the observed score variance • True score variance – Real differences between subjects’ scores due to biologically different people • Observed score variance – The portion of variability that is due to faults in measurement Evidence-based Chiropractic 10 © 2006
Observed score variance Evidence-based Chiropractic 11 © 2006
The reliability coefficient Reliability coefficient = True score variance + Error variance • Becomes larger (increased reliability) as error variance gets smaller – Equals 1. 0 when error variance is 0. 0 • Becomes smaller (decreased reliability) as error variance gets larger Evidence-based Chiropractic 12 © 2006
Interpretation of the reliability coefficient • A reliability coefficient of 0. 75 means that 75% of the variance in the scores is due to the true variance of the trait being measured and 25% is due to the error variance Evidence-based Chiropractic 13 © 2006
Interpretation of the reliability coefficient (cont. ) • Ranges from 0. 0 to 1. 0 – 0. 0 represents no reliability and 1. 0 perfect reliability • Implications – 0. 75 or greater good reliability – 0. 5 to 0. 75 moderate reliability – <0. 5 indicates poor reliability. Evidence-based Chiropractic 14 © 2006
Inter-examiner reliability • When 2 or more examiners test the same subjects for the same characteristic using the same measure, scores should match • Inter-examiner reliability is the degree that their findings agree Evidence-based Chiropractic 15 © 2006
Intra-examiner reliability • Scores should also match when the same examiner tests the same subjects on two or more occasions • Intra-examiner reliability is the degree that the examiner agrees with himself or herself Evidence-based Chiropractic 16 © 2006
Quantifying inter-examiner and intra-examiner reliability • Correlation – There should be a high degree of correlation between scores of 2 examiners testing the same group of subjects or 1 examiner testing the same group on 2 occasions – However, it is possible to have good correlation and concurrent poor agreement • Occurs when 1 examiner consistently scores subjects higher or lower than the other examiner Evidence-based Chiropractic 17 © 2006
Graphing reliability Examiner 2 scores 50 ▼ ▼ 40 ▼ ▼ 30 ▼ ▼ ▼ 20 ▼ 10 ▼ ▼ ▼ ▼ 10 20 30 40 Examiner 1 scores Evidence-based Chiropractic 18 Very good correlation 50 © 2006
Good correlation and concurrent poor agreement Examiner 2 scores 50 Good correlation, but no agreement 40 ▼ Examiner 1 = 40 ▼ Examiner 2 = 50 Examiner 1 = 30 Examiner 2 = 40 Examiner 1 = 20 Examiner 2 = 30 ▼ 20 10 ▼ Examiner 1 = 10 Examiner 2 = 20 10 20 30 40 Examiner 1 scores Evidence-based Chiropractic 19 50 © 2006
Test-retest reliability • A test is administered to the same group of subjects on more than one occasion – Test scores should be consistent when repeated – Test scores should correlate well • Test-retest reliability is used to assess self -administered questionnaires which are not directly controlled by the examiner Evidence-based Chiropractic 20 © 2006
Test-retest reliability (cont. ) • It is assumed that the condition being considered has not changed between tests • Conditions that noticeably change over time are not good candidates for testretest reliability studies – e. g. , pain and disability status Evidence-based Chiropractic 21 © 2006
Test-retest reliability (cont. ) Questionnaire (Time 1) Questionnaire (Time 2) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh 9 hh hh 10 hh Evidence-based Chiropractic 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh hh 7 hh hh 8 hh hh hh 9 hh hh 10 hh hh ? = 22 © 2006
Parallel forms reliability a. k. a. Alternate forms reliability • Two versions of a questionnaire or test that measures the same construct are compared • Both versions are administered to the same subjects • Scores are compared to determine the level of correlation Evidence-based Chiropractic 23 © 2006
Parallel forms reliability (cont. ) Questionnaire (Version 1) Questionnaire (Version 2) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh 9 hh hh 10 hh Evidence-based Chiropractic 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh hh 7 hh hh 8 hh hh hh 9 hh hh 10 hh hh ? = 24 © 2006
Internal consistency reliability • The degree each of the items in a questionnaire measures the targeted construct • All questions should measure various characteristics of the construct and nothing else Evidence-based Chiropractic 25 © 2006
Internal consistency reliability (cont. ) • A questionnaire is administered to 1 group of subjects on 1 occasion • The results are examined to see how well questions correlate • If reliable, each question contributes in a similar way to the questionnaire’s overall score Evidence-based Chiropractic 26 © 2006
Internal consistency reliability (cont. ) Does - Q 1 correlate well with Q 8 Q 1 with Q 9 Q 2 with Q 7 Questionnaire 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Also Do - Q 1, Q 7, Q 9, etc. correlate well with the total score Total score____ ? Evidence-based Chiropractic 27 © 2006
Cronbach’s coefficient alpha • A measure of internal consistency that evaluates items in a questionnaire to determine the degree that they measure the same construct • Is essentially the mean correlation between each of a set of items Evidence-based Chiropractic 28 © 2006
Cronbach’s alpha (cont. ) • Values range from 1, representing perfect internal consistency, to less than zero when a questionnaire includes many negatively correlating items • Alpha values ≥ 0. 70 are generally considered to be acceptable Evidence-based Chiropractic 29 © 2006
2 X 2 contingency table to compare results of examiners • Useful to visualize the results of two examiners who are evaluating the same group of patients • Inter-examiner reliability articles often present their findings in the form of a 2 X 2 contingency table – If not, they are fairly easy to create from the data presented in the article Evidence-based Chiropractic 30 © 2006
2 X 2 contingency table (cont. ) Rater 2 Test + Test - Row Total Test + a b a+b Test - c d c+d a+c b+d Rater 1 Agreements - a & d Column Total Disagreements - b & c Evidence-based Chiropractic 31 a+b+c+d Grand Total © 2006
The kappa statistic (κ) • Agreement between examiners evaluating the same patients can be represented by the percentage of agreement of paired ratings • However, percentage of agreement does not account for agreement that would be expected to occur by chance Evidence-based Chiropractic 32 © 2006
The kappa statistic (cont. ) – Even using unreliable measures, a few agreements are expected to occur just by chance • Only agreement that occurs beyond chance levels represents true agreement • This is what is represented by the kappa statistic – It is appropriate for use with dichotomous or nominal data Evidence-based Chiropractic 33 © 2006
The kappa statistic (cont. ) Kappa = observed agreement - chance agreement 1 - chance agreement – Where observed agreement (PO) is the total proportion of observations where there is agreement PO = number of exact agreements number of possible agreements Evidence-based Chiropractic 34 or a + d a + b + c + d © 2006
The kappa statistic (cont. ) – Chance agreement (PC) is the proportion of agreements that would be expected by chance PC = number of expected agreements number of possible agreements or aexpected + dexpected a + b + c + d – aexpected and dexpected can be found using the same procedure used to calculate expected cell values in the chi square test – (Multiply the row total by the column total for cells a and d and then dividing by the grand total) Evidence-based Chiropractic 35 © 2006
The kappa statistic (cont. ) – The values of PO and PC are then utilized in the following formula to calculate the kappa statistic Kappa = PO - PC 1 - PC – When the amount of observed agreement exceeds chance agreement, kappa will be positive – The strength of agreement is determined by the magnitude of kappa – If negative, agreements are less than chance Evidence-based Chiropractic 36 © 2006
Interpretation of kappa values Kappa Agreement value beyond chance 0 None 0– 0. 2 Slight 0. 2– 0. 4 Moderate 0. 4– 0. 6 Fair 0. 6– 0. 8 Substantial 0. 8– 1. 0 Almost perfect Evidence-based Chiropractic 37 © 2006
Kappa example • Reliability of Mc. Kenzie classification of patients with cervical or lumbar pain – 50 spinal pain patients (25 lumbar and 25 cervical) were simultaneously assessed by 2 physical therapists (14 in total) to classify patients into syndromes and subsyndromes κ = 0. 84 for syndrome classification κ = 0. 87 for subsyndrome classification Evidence-based Chiropractic 38 © 2006
Intraclass Correlation Coefficient (ICC) • Another measure of inter-examiner reliability that is for use with continuous variables • Can be used to evaluate 2 or more raters • Pearson’s r can be used – But ICC is preferred when sample size is small (<15) or more than two tests are involved Evidence-based Chiropractic 39 © 2006
ICC (Cont. ) • There are three models of ICC that may utilize one of two different forms – Thus, 6 possible types of ICC depending on how raters are chosen and how subjects are assigned • The type of ICC used should always be presented in research papers – The first number represents the ICC model – The second represents the form used Evidence-based Chiropractic 40 © 2006
ICC (Cont. ) • For example – Clare et al reported on the reliability of detection of lumbar lateral shift and found it to be moderate – ICC [2, 1] values ranging from 0. 48 to 0. 64 Model Evidence-based Chiropractic Form 41 © 2006
ICC is an index of reliability • Can range from below 0. 0 to +1. 0 – With ≈0. 0 indicating weak reliability ≈1. 0 strong reliability • Suggested interpretation – Some clinical measures require ≥ 0. 90 Evidence-based Chiropractic ICC Degree of value reliability >0. 75 Excellent 0. 40 to 0. 75 Fair to good <0. 4 Poor 42 © 2006
ICC is based on variance • ICC is the ratio of between-groups variance to total variance, where – Between-groups variance is due to different subjects having test scores that truly differ – Total variance is due to score differences resulting from inter-rater unreliability of two or more examiners rating the same person • Two-way ANOVA is used to calculate ICC Evidence-based Chiropractic 43 © 2006
Validity • The ability of tests and measurements to in fact evaluate the traits that they were intended to evaluate – Vital in research, as well as in clinical practice • The extent of a test’s validity depends on the degree to which systematic error has been controlled for Evidence-based Chiropractic 44 © 2006
Validity (cont. ) • The greater the validity, the more likely test results will reflect true differences between scores and not systematic error • It’s a matter of degrees, not black-andwhite – Technically incorrect to say a test is “valid” or “invalid” – Better to use categories like highly valid, moderately valid, etc. Evidence-based Chiropractic 45 © 2006
Validity (cont. ) • Test validity depends on its intended purpose – For example, a hand-grip dynamometer is valid to measure grip strength, but it is not valid to measure the qualities of hand tremor Evidence-based Chiropractic 46 © 2006
Validity (cont. ) • An invalid test can still be reliable – For example, a test that used skull circumference to predict intelligence – Reliability would probably be excellent, but it would not be a valid predictor of intelligence • But an unreliable test can never be considered valid Evidence-based Chiropractic 47 © 2006
Methods to estimate the extent of test validity • Can be divided into 3 major categories – Self-evident • Does the test appear to measure what it is supposed to measure – Pragmatic • Does the test actually work as hypothesized – Construct validity • Does the test adequately measure theoretical construct involved Evidence-based Chiropractic 48 © 2006
Self-evident methods • Face validity – Simply deciding whether a test appears to have merit based on “face value” • e. g. , if a headache questionnaire asked about the location of head pain it would have face validity • If it asked about hair color, it probably would not – The lowest level of test validation – Often assessed when researchers are first exploring a topic Evidence-based Chiropractic 49 © 2006
Self-evident methods (cont. ) • Content validity – The ability of a test to include or represent all of the content of a construct • Another definition for content validity – The content of a test is compared to the literature that is already available on the topic – The test is said to have good content validity if it accurately reflects what is in the literature Evidence-based Chiropractic 50 © 2006
Pragmatic methods • Criterion-related validity – The degree a test corresponds with an external criterion that is an independent measure of the characteristic being tested • A criterion is the standard by which a measure is judged – A valid test should correlate well with or predict some relevant criterion – Concurrent and predictive validity are subgroups of criterion-related validity Evidence-based Chiropractic 51 © 2006
Pragmatic methods (cont. ) • Concurrent validity – The results of a new test are compared with an established test (gold standard) to see if they are well correlated – Both tests are given at the same time – For example, a study that compares a clinical test to detect spondylolisthesis with x-ray findings Evidence-based Chiropractic 52 © 2006
Pragmatic methods (cont. ) • Gold standard test – a. k. a, reference standard – A test that is generally acknowledged to be the best available – The value of a concurrent validity trial depends greatly on the quality of the gold standard that is used Evidence-based Chiropractic 53 © 2006
Pragmatic methods (cont. ) • Construct validity – The extent to which a test effectively measures a theoretical construct • Like pain or disability – The characteristic is not observed directly – Rather, an abstraction of the characteristic that corresponds to the construct under consideration is observed • e. g. , a pain scale or disability questionnaire Evidence-based Chiropractic 54 © 2006
Pragmatic methods (cont. ) • Construct validity can be thought of as the accumulation of evidence that points to the ability of a test to actually measure what it claims to measure • It involves the accumulation of evidence by establishing some of the other types of validity – The validity of a test is supported if the results of these studies agree with one another Evidence-based Chiropractic 55 © 2006
Pragmatic methods (cont. ) • Construct validity is determined by comparing a new test with other tests that measure a similar construct • Another way to evaluate construct validity is to compare the new test with other tests that are different, but related, which should not correlate well Evidence-based Chiropractic 56 © 2006
Pragmatic methods (cont. ) • Convergent validity – Has to do with the degree of correlation that exists between a new test and another measure of the same or similar constructs – A test that has good convergent validity correlates well with another measure of the same construct Evidence-based Chiropractic 57 © 2006
Pragmatic methods (cont. ) • Discriminant validity – The opposite of convergent validity, where the new test is weakly related to or unrelated to another measure that it should in fact be different from – A test with good discriminant validity should be able to separate patients into different groups • e. g. , normal vs. abnormal Evidence-based Chiropractic 58 © 2006
Types of validity for tests or measures Self-evident Face validity – Whether the test appears to measure what it was intended to measure. Content validity – The test fully measures the construct of what it is supposed to measure. Pragmatic Construct validity – The ability of a test to measure concepts or ideas that cannot be observed directly. Criterion-related validity – The results of the test correlate well with the results of another test designed to measure the same thing. –Concurrent validity – The test correlates well with an established test that measures the same phenomenon. –Predictive validity – The test is capable of measuring a trait and then predicting an outcome. Convergent validity – The extent that a test correlates well with another measure of the same construct. Discriminant Validity – The extent that a test does not correlate well with another test that it should not be related to. Evidence-based Chiropractic 59 © 2006
The concept of validity and reliability • Can be compared with scores on a target • Scores may be systematically off center – Results from bias – The test environment is faulty, causing all scores to be inaccurate – Scores miss the bull’s eye in one direction • Scores may be randomly off center – Scores miss the bull’s eye in any direction Evidence-based Chiropractic 60 © 2006
The concept of validity and reliability (cont. ) – When test scores miss the bull’s eye in any direction, it is caused by random error – Some subjects are affected while others are not • Accurate tests – Are free from bias • Precise tests – Are free from random error Evidence-based Chiropractic 61 © 2006
Accuracy and precision An accurate and precise An inaccurate test syste- An imprecise test misses test hits the bull’s eye matically misses the bull’s eye randomly and is tightly grouped bull’s eye in one direction Evidence-based Chiropractic 62 © 2006
Cutoff points • Test results involving ordinal or continuous measures are often converted to a dichotomous scale (dichotomized) • Achieved by establishing a cutoff point at a specified value – Scores above the specified value are considered positive – Scores below the value are negative Evidence-based Chiropractic 63 © 2006
The ideal diagnostic test • Would always correctly discriminate between those with and those without the condition – Always positive for those with the condition – Always negative for those without it Evidence-based Chiropractic 64 © 2006
The ideal test Always negative for those without the condition Evidence-based Chiropractic Always positive for those with the condition 65 © 2006
Real-world test False negatives Evidence-based Chiropractic False positives 66 © 2006
Sensitivity and Specificity • Commonly used to assess the validity of tests • Sensitivity – The ability of a test to correctly identify people who have the target disorder • Specificity – The ability of a test to correctly identify people who do not have the target disorder Evidence-based Chiropractic 67 © 2006
Sensitivity and Specificity (cont. ) • Expressed as a percentage – 0% represents no sensitivity or specificity – 100% is perfect sensitivity or specificity • A 2 X 2 contingency table can be used to calculate these indices Evidence-based Chiropractic 68 © 2006
2 X 2 contingency table Condition (per “gold standard”) Test Result Present Absent Row Total Positive a (True +) b (False +) a+b Negative c (False -) d (True -) c+d Column Total Evidence-based Chiropractic a+c b+d 69 a+b+c+d Grand Total © 2006
Sensitivity and Specificity (cont. ) Sensitivity = a/(a+c) = Specificity = d/(b+d) = Evidence-based Chiropractic 70 © 2006
Sn. OUT (Sensitivity rules OUT) • In tests that have very high sensitivity – A negative test will rule out the condition under consideration – This is because there are very few false negatives in tests with very high sensitivity – If a test with very high sensitivity is negative, it is very likely a true negative Evidence-based Chiropractic 71 © 2006
Sp. IN (SPecificity rules IN) • In tests that have very high specificity – A positive test will rule in the condition under consideration – This is because there are very few false positives in tests with very high specificity – If a test with very high specificity is positive, it is very likely a true positive Evidence-based Chiropractic 72 © 2006
The cutoff point influences a test’s sensitivity & specificity Higher scores point to a worsening condition False negatives False positives If the cutoff point is raised, specificity increases, but there are more false negatives Evidence-based Chiropractic 73 © 2006
The cutoff point and sensitivity & specificity (cont. ) False negatives False positives If the cutoff point is lowered, sensitivity increases, but there are more false positives Evidence-based Chiropractic 74 © 2006
The cutoff point and sensitivity & specificity (cont. ) • Because increasing sensitivity will decrease specificity, and increasing specificity will decrease sensitivity, the cutoff point that is set depends on – Whether it is best to maximize sensitivity at the expense of specificity, or – Whether it is best to maximize specificity at the expense of sensitivity Evidence-based Chiropractic 75 © 2006
Receiver Operating Characteristic (ROC) curves • Graphically depicts the tradeoff between sensitivity and specificity • In accurate tests – The curve closely follows the left-hand border and the top border of the ROC space • In less accurate the tests – The curve is closer to the 45 -degree diagonal of the ROC space Evidence-based Chiropractic 76 © 2006
ROC curves (cont. ) Evidence-based Chiropractic 77 © 2006
ROC curves (cont. ) Cut-off low = high sensitivity, but more false positives Cut-off high = low sensitivity, but fewer false positives Evidence-based Chiropractic 78 © 2006
Implications of sensitivity & specificity • In tests with low sensitivity – People with the target disorder will be missed (false negatives) • In tests with low specificity – People who do not actually have the target disorder will be identified as having it (false positives) Evidence-based Chiropractic 79 © 2006
Implications of sensitivity & specificity (cont. ) • Tests with high sensitivity may be suitable when the consequences of reporting false positive findings to a patient are minor – e. g. , incorrectly reporting to a patient that their triglycerides are elevated which results in them shifting to a healthier lifestyle Evidence-based Chiropractic 80 © 2006
Implications of sensitivity & specificity (cont. ) • Tests with high specificity are better when false positive findings lead to painful or expensive treatment – e. g. , a test that leads to surgical intervention – In this case false positives must be minimized Evidence-based Chiropractic 81 © 2006
Implications of sensitivity & specificity (cont. ) • Screening for rare conditions – Many false positives may result since very few cases have the potential to be detected, even when highly specific tests are used – Not a serious problem if positive screening leads to confirmatory testing • Screening for common conditions – Many cases may be overlooked, even when a highly sensitive test is used Evidence-based Chiropractic 82 © 2006
What is an acceptable level of sensitivity specificity? • There is no general agreement, also it depends on the clinical situation • Is changeable when – The intent of the test or the setting changes – The prevalence of the condition is different in the group being tested – Alternate methods of testing are available Evidence-based Chiropractic 83 © 2006
Predictive value of a test • Positive predictive value Test result – The probability that a positive test will correctly identify people who have the target disorder – a/(a+b) Condition Evidence-based Chiropractic Present Absent Positive a b Negative c d 84 © 2006
Predictive value of a test (cont. ) • Negative predictive value Test result – The probability that a negative test will correctly identify people who do not have the target disorder Condition Present Absent – d/(c+d) Evidence-based Chiropractic Positive a b Negative c d 85 © 2006
Test Result Condition (per “gold standard”) Present Absent Row Total Positive a b a+b Negative c d c+d Column Total a+c b+d a+b+c+d Grand Total Sensitivity = a/(a+c) Specificity = d/(b+d) Positive predictive value = a/(a+b) Negative predictive value = d/(c+d) Evidence-based Chiropractic 86 © 2006
Likelihood ratio (LR) • The probability that the results of a diagnostic test would be expected in a patient with the condition of interest (sensitivity) compared to the expected results of the same test in a patient without the condition (specificity) • Applies to positive as well as negative tests Evidence-based Chiropractic 87 © 2006
Likelihood ratio (cont. ) • LR of a positive test (LR+) – A ratio of the probability of a positive test in a person with the condition compared to the probability of a positive test in a person without the condition a/(a+c) 1 -d/(b+d) Evidence-based Chiropractic or sensitivity 1 -specificity 88 © 2006
Likelihood ratio (cont. ) • In a positive test – LR >1, the probability that the condition is present is increased – LR <1, the probability that the condition is present is decreased – LR =1, the probability that the condition is present versus not being present is the same Evidence-based Chiropractic 89 © 2006
Likelihood ratio (cont. ) • LR of a negative test (LR-) – A ratio of the probability of a negative test in a person with the condition compared to the probability of a negative test in a person without the condition 1 -a/(a+c) d/(b+d) Evidence-based Chiropractic or 1 -sensitivity specificity 90 © 2006
Likelihood ratio (cont. ) • LRs have been referred to as the most useful single indicator of a test’s diagnostic strength • They can be used to help make decisions about the need of further testing • Also, choosing the appropriate time to begin treatment Evidence-based Chiropractic 91 © 2006
Meaning of LRs • LR >10 or <0. 1 – Generates large and conclusive changes in the probability of a given diagnosis • LR in the range of 5 to 10 or 0. 1 to 0. 2 – Generates a moderate and usually important change in the probability of a given diagnosis • LR in the range of 2 to 5 or 0. 5 to 0. 2 – Generates a small but sometimes important change in the probability of a given diagnosis • LR in the range of 1 to 2 or 0. 5 to 1 – Changes the probability of a given diagnosis to a small and rarely important degree Evidence-based Chiropractic 92 © 2006
Meaning of LRs (cont. ) • LRs >10 indicate that the test can be used to rule the condition in • LRs ~ 1 provide no useful information for ruling the condition in or out • LRs <0. 1 indicate that the test can be used to rule the condition out Evidence-based Chiropractic 93 © 2006
Pre-test probability • The probability that a patient has a condition before the test is carried out • Is based on the clinician’s experience, the prevalence of the condition, and published literature • May be modified up or down if the patient has risk factors Evidence-based Chiropractic 94 © 2006
Post-test probability • Is generated by combining a patients pretest probability of having the condition with the test’s LR – A high pre-test probability coupled with a high LR produces a very high post-test probability – A low pre-test probability coupled with a low LR produces a very low post-test probability Evidence-based Chiropractic 95 © 2006
Using LRs with Pre-test & Post-test probabilities • A practitioner’s confidence about a correct diagnosis would be higher after positive results of a test with a high LR • Especially if the pre-test probability was high • Thus, clinicians can use them in making decisions about the need for further testing and when to begin treatment Evidence-based Chiropractic 96 © 2006
Using LRs with Pre-test & Post-test probabilities (cont. ) • When the post-test probability is very high, the condition is very likely present and treatment should be initiated • When it is very low, the condition can be ruled out and no further diagnostic or therapeutic action is necessary Evidence-based Chiropractic 97 © 2006
Using a nomogram Draw a line between the pre-test probability and the LR, extending to the post-test probability Using a nomogram Next, the test’s LR is obtained from an article First, the pretest probability is estimated Evidence-based Chiropractic 98 © 2006
Using LRs with Pre-test & Post-test probabilities (cont. ) • LRs and post-test probabilities can be used serially – The post-test probability resulting from one test can be used as a pre-test probability for the next one Evidence-based Chiropractic 99 © 2006
Clinical disagreement • Practitioners can still disagree about clinical findings, even when valid and reliable tests are used • 3 sources of clinical disagreement – The examiner (practitioner) – The examined (patient) – The examination Evidence-based Chiropractic 100 © 2006
Clinical disagreement due to the examiner 1. Biological variations of senses – Many tests rely on the examiners abilities – Some people have better hearing, sight, more skill at palpation, etc. 2. Tendency to record inferences rather than evidence – Examiners may “pre-diagnose” patients based on visible cues before actual examination Evidence-based Chiropractic 101 © 2006
Clinical disagreement due to the examiner (cont. ) 3. Ensnarement by diagnostic classification schemes – Vague diagnostic criteria and the tendency to pigeon-hole patients 4. Entrapment by prior expectation – Tendency for examiners to find what they hope to find (e. g. , chiropractors find back problems, urologists find kidney problems) 5. Examiner incompetence Evidence-based Chiropractic 102 © 2006
Clinical disagreement due to the examined 1. Biological variation – Many conditions vary from day-to-day 2. Effects of illness and medications – A patient with severe pain is very difficult to examine – Pain medications may mask the true findings Evidence-based Chiropractic 103 © 2006
Clinical disagreement due to the examined (cont. ) 3. Memory and rumination – Chronic patients may include everything under the sun, or only what they think caused the problem (selective memory) – Recall bias 4. Toss-ups – Deals with conflicting ways to manage a patients condition Evidence-based Chiropractic 104 © 2006
Clinical disagreement due to the examination 1. Disruptive environment – e. g. , an athletic field or a child crying during a parent’s examination 2. Disruptive interactions between examiner and patient – Patients won’t confide in a doctor they don’t like or trust 3. Dysfunctional or incorrectly used diagnostic tools Evidence-based Chiropractic 105 © 2006
Appraising reliability and validity articles • First decide whether purpose of the study is to assess the test’s reliability or validity (or both) – Reliability studies assess the consistency of tests within or between examiners or questionnaires – Validity studies compare test results with established tests, or how accurately the test predicts a future outcome Evidence-based Chiropractic 106 © 2006
Appraising reliability and validity articles (cont. ) • Was the test adequately described? – Should mention how patients prepared for the test (e. g. , fasting prior to a blood test) – What patients had to endure (e. g. , drugs given for routine colonoscopy) • Patient inconvenience, cost, and harm must be weighed against the need for information – How the results were analyzed and interpreted Evidence-based Chiropractic 107 © 2006
Appraising reliability and validity articles (cont. ) • Did the study sample include a full range of subjects with and without the condition? – All types of patients should be included, like one would see in everyday clinical practice – If too many sick are included, there is a greater chance that those with the disease will test positive • Such tests may be able to identify obviously ill patients, but not those who are only mildly ill Evidence-based Chiropractic 108 © 2006
Appraising reliability and validity articles (cont. ) • If the study utilized a gold standard for comparison, was it an acceptable one? – The credibility of a validity study depends on the soundness of the gold standard – It is often difficult to find an ideal gold standard since most tests do not have both high sensitivity and high specificity – Especially complex for spinal function tests Evidence-based Chiropractic 109 © 2006
Appraising reliability and validity articles (cont. ) • Were the test results and the gold standard assessed independently in a blinded fashion? – Raters should be unaware of the results of previous testing, because this knowledge can greatly affect the interpretation of tests – Expectation bias • When raters are influenced by knowledge of certain features of the case Evidence-based Chiropractic 110 © 2006
Appraising reliability and validity articles (cont. ) – Verification bias • When the decision to carry out the gold standard test is influenced by the results of the test that is being evaluated – Be wary of studies that use more than one type of gold standard test • e. g. , some patients are biopsied, while others wait to see if the condition develops Evidence-based Chiropractic 111 © 2006
Appraising reliability and validity articles (cont. ) • Do the results of this study apply to the patient before me? – The study’s population should be comparable to the patient on factors such as age, gender, and condition severity – Prevalence or severity of the condition may be higher in an academic environment • As a result, the test’s sensitivity may be higher than if it were studied in the general population Evidence-based Chiropractic 112 © 2006
Appraising reliability and validity articles (cont. ) • Will patients benefit as a result of being tested? – Is the new test really preferable to the old one • It may be less convenient, more expensive, and provide little or no added information • Beware of studies on diagnostic tests that have commercial ties – Test results should benefit the patient and actually result in a change in the way their condition is managed Evidence-based Chiropractic 113 © 2006
Appraising reliability and validity articles (cont. ) – One must also consider the consequences of not performing the test • For instance, a test that is designed to detect a condition that is potentially very harmful if left undiagnosed • e. g. , arterial dissection or abdominal aneurysm – The risk associated with the test should be proportional to the importance of the information to be gained Evidence-based Chiropractic 114 © 2006
Appraising reliability and validity articles (cont. ) • Is the test reliable? – Coefficients of agreement should be within acceptable ranges – P values or confidence intervals should point to statistically significant findings Evidence-based Chiropractic 115 © 2006
Appraising reliability and validity articles (cont. ) • Is the test valid? – P values or confidence intervals should be reported and should be significant – The gold standard should be a valid marker for what is being tested – Sensitivity and specificity should be sufficiently high • Depends on the planned use of the test Evidence-based Chiropractic 116 © 2006
- Slides: 116