Evaluating SelfReport Data Using Psychometric Methods Ron D





























































- Slides: 61
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, Ph. D. (hays@rand. org) February 5, 2003 (3: 00 -6: 00 pm) 1 9/17/2020
Four Types of Data Collection Errors • Coverage Error Does each person in population have an equal chance of selection? • Sampling Error Are only some members of the population sampled? • Nonresponse Error Do people in the sample who respond differ from those who do not? • Measurement Error Are inaccurate answers given to survey questions? 2 9/17/2020
What’s a Good Measure? • Same person gets same score (reliability) • Different people get different scores (validity) • People get scores you expect (validity) • It is practical to use (feasibility) 3 9/17/2020
How Are Good Measures Developed? • Review literature • Expert input (patients and clinicians) • Define constructs you are interested in • Draft items (item generation) • Pretest – Cognitive interviews – Field and pilot testing • Revise and test again • Translate/harmonize across languages 4 9/17/2020
Scales of Measurement and Their Properties Property of Numbers Type of Scale Nominal Ordinal Interval Ratio Rank Order No Yes Yes Equal Interval No No Yes Absolute 0 No No No Yes 5 9/17/2020
Measurement Range for Health Outcome Measures Nominal Ordinal Interval Ratio 6 9/17/2020
Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale) 7 9/17/2020
Variability • All scale levels are represented • Distribution approximates bell-shaped "normal" 8 9/17/2020
Measurement Error observed = true + score systematic + error random error (bias) 9 9/17/2020
Flavors of Reliability • Test-retest (administrations) • Intra-rater (raters) • Internal consistency (items) 10 9/17/2020
Test-retest Reliability of MMPI 317 -362 r = 0. 75 MMPI 317 True False True MMPI 362 169 15 184 False 21 95 116 190 110 I am more sensitive than most other people. 11 9/17/2020
Kappa Coefficient of Agreement (Corrects for Chance) kappa = (observed - chance) (1 - chance) 12 9/17/2020
Example of Computing KAPPA 1 4 2 1 1 2 2 3 2 2 4 5 Column Sum 5 Row Sum 1 2 Rater B Rater A 3 1 3 2 2 2 10 13 9/17/2020
Example of Computing KAPPA (Continued) Pc = (1 x 2) + (3 x 2) + (2 x 2) Pobs. = Kappa = 9 10 (10 x 10) = = 0. 20 0. 90 - 0. 20 1 - 0. 20 = 0. 87 14 9/17/2020
Guidelines for Interpreting Kappa Conclusion Poor Kappa <. 40 Conclusion Poor Kappa < 0. 0 Fair . 40 -. 59 Slight . 00 -. 20 Good . 60 -. 74 Fair . 21 -. 40 Excellent >. 74 Moderate . 41 -. 60 Substantial . 61 -. 80 Almost perfect. 81 - 1. 00 Fleiss (1981) Landis and Koch (1977) 15 9/17/2020
Ratings of Height of Houseplants Baseline Height Plant A 1 A 2 B 1 B 2 C 1 Follow-up Height Experimental Condition R 1 R 2 120 118 121 120 1 R 2 084 096 085 088 2 R 1 R 2 107 105 108 104 2 R 1 R 2 094 097 100 104 1 R 2 085 091 088 096 2 16 9/17/2020
Ratings of Height of Houseplants (Cont. ) Baseline Height Plant C 2 D 1 D 2 E 1 E 2 Follow-up Height Experimental Condition R 1 R 2 079 078 086 092 1 R 2 070 072 076 080 1 R 2 054 056 060 2 R 1 R 2 085 097 101 108 1 R 2 090 092 084 096 2 17 9/17/2020
Reliability of Baseline Houseplant Ratings of Height of Plants: 10 plants, 2 raters Baseline Results Source Plants Within Total DF SS MS 9 5658 628. 667 10 177 17. 700 Raters 1 57. 800 Raters x Plants 9 119. 2 13. 244 19 F 35. 52 5835 18 9/17/2020
Sources of Variance in Baseline Houseplant Height Source Plants (N) dfs MS 9 628. 67 10 17. 70 (WMS) Raters (K) 1 57. 80 (JMS) Raters x Plants 9 13. 24 (EMS) Within Total (BMS) 19 19 9/17/2020
Intraclass Correlation and Reliability Model Reliability One-Way MS BMS - MS Intraclass Correlation MS WMS MS BMS Two-Way Fixed MS BMS - MS WMS MS BMS + (K-1)MS WMS - MS EMS MS BMS - MS MS BMS MS EMS Two-Way N (MS BMS - MS EMS) Random NMS BMS +MS JMS - MS EMS + (K-1)MSEMS MS BMS - MS EMS MS BMS + (K-1)MS EMS + K(MS JMS - MS EMS )/N 20 9/17/2020
Summary of Reliability of Plant Ratings Baseline RTT RII One-Way Anova 0. 97 0. 95 0. 97 0. 94 Two-Way Random Effects 0. 97 0. 95 0. 97 0. 94 Two-Way Fixed Effects 0. 98 0. 96 0. 98 0. 97 Source Plants Within ICC (1, 1) = Raters X Plants ICC (2, 1) = ICC (3, 1) = Label BMS WMS JMS EMS Follow-up RTT RII Baseline MS 628. 667 17. 700 57. 800 13. 244 BMS - WMS BMS + (K - 1) * WMS BMS - EMS 21 9/17/2020
Cronbach’s Alpha Source df Respondents (BMS) 4 Items (JMS) 1 Resp. x Items (EMS) 4 Total Alpha = 9 SS 11. 6 0. 1 4. 4 MS 2. 9 0. 1 16. 1 2. 9 - 1. 1 = 1. 8 = 0. 62 2. 9 22 9/17/2020
Alpha by Number of Items and Inter-item Correlations alphast = K = _ K r _ 1 + (K - 1 ) r number of items in scale 23 9/17/2020
Alpha for Different Numbers of Items and Homogeneity Average Inter-item Correlation ( r ) Number of Items (K) 2 4 6 8 . 0. 000 . 2. 333. 500. 666 . 4 . 6 . 8 1. 0 . 572. 727. 800. 842 . 750. 857. 900. 924 . 889. 941. 960. 970 1. 000 24 9/17/2020
Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI) 25 9/17/2020
Spearman-Brown Prophecy Formula alpha y = ( N • alpha x 1 + (N - 1) * alpha x ) N = how much longer scale y is than scale x 26 9/17/2020
Reliability Minimum Standards • 0. 70 or above (for group comparisons) • 0. 90 or higher (for individual assessment) Ø SEM = SD (1 - reliability)1/2 27 9/17/2020
Reliability of a Composite Score 28 9/17/2020
Hypothetical Multitrait/Multi-Item Correlation Matrix 29 9/17/2020
Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical 0. 67† 0. 54† 0. 41 0. 53 0. 60† 0. 58† Interpersonal 0. 28 0. 50† 0. 44† 0. 56† 0. 57† 0. 68* 0. 58* 0. 65* 0. 57* 0. 62* 0. 48* 0. 63† 0. 61† 0. 67† 0. 60† 0. 58† 0. 46† Interpersonal 1 0. 25 0. 26 0. 16 0. 23 0. 24 0. 18 0. 19 0. 32 0. 18 0. 24 Communication 0. 66* 2 Financial 0. 63† 0. 55* 30. 48* 40. 59* 50. 55* 60. 59* 10. 58 20. 59† 30. 62† 40. 53† 50. 54 60. 48† Note – Standard error of correlation is 0. 03. Technical = satisfaction with technical quality. Interpersonal = satisfaction with the interpersonal aspects. Communication = satisfaction with communication. Financial = satisfaction with financial arrangements. *Item-scale correlations for hypothesized scales (corrected for item overlap). †Correlation within two standard errors of the correlation of the item with its hypothesized scale. 30 9/17/2020
31 9/17/2020
IRT 32 9/17/2020
What are IRT Models? Mathematical equations that relate observed survey responses to a persons location on an unobservable latent trait (i. e. , intelligence, patient satisfaction). 33 9/17/2020
Latent Trait and Item Responses Item 1 Response Latent Trait Item 2 Response Item 3 Response P(X 1=1) P(X 1=0) 1 0 P(X 2=1) P(X 2=0) 1 0 P(X 3=0) 0 P(X 3=1) P(X 3=2) 1 2 34 9/17/2020
IRT Model Assumptions • Unidimensionality - One construct measured by items in scale. • Local Independence - Items uncorrelated when latent trait(s) have been controlled for. 35 9/17/2020
Types of IRT Models • Unidimensional and multidimensional • Dichotomous and polytomous • Parameterization - One parameter: difficulty (location) - Two Parameter: difficulty and slope (discrimination) 36 9/17/2020
Item difficulty Transform proportion of people endorsing the item (p) to correspond to (1 -p)th percentile from z distribution Z = ln (1 -p)/p)/1. 7 = (ln (1 -p) – ln (p))/1. 7 = (ln (. 228) – ln (. 772))/1. 7 = (-1. 47840965 +. 258770729)/1. 7 = -1. 21963892/1. 7 = -0. 72 37 9/17/2020
Item Discrimination • Item-scale correlation, corrected for item overlap - Z’ = ½ [ln (1 + r) – ln (1 -r) ] - if r = 0. 30, z = 0. 31 - if r = 0. 80, z = 1. 10 - if r = 0. 95, z = 1. 83 (0. 5 -> 2 is typical range) 38 9/17/2020
1 -Parameter Logistic Model for (Dichotomous Outcomes) 39 9/17/2020
Prop. of “Yes” Item Characteristic Curves (1 -Parameter Model) 40 9/17/2020
2 -Parameter Logistic Model (Dichotomous Outcomes) 41 9/17/2020
Item Characteristic Curves (2 -Parameter Model) 42 9/17/2020
Item Responses and Trait Levels Person 1 Item 1 Person 2 Item 2 Person 3 Item 3 Trait Continuum 43 9/17/2020
Information Conditional on Trait Level • Item information proportional to inverse of standard error: • Scale/Test information is the sum over item information: 44 9/17/2020
Item Information (2 -parameter model) 45 9/17/2020
Linking Item Content to Trait Estimates 46 9/17/2020
Dichotomous Items Showing DIF (2 -Parameter Model) Hispanics Whites DIF – Location (Item 1) DIF – Slope (Item 2) Hispanics Whites 47 9/17/2020
Forms of Validity • Content • Criterion • Construct Validity 48 9/17/2020
Construct Validity • Does measure relate to other measures in ways consistent with hypotheses? • Responsiveness to change 49 9/17/2020
Relative Validity Analyses • Form of "known groups" validity • Relative sensitivity of measure to important clinical difference • One-way between group ANOVA 50 9/17/2020
Relative Validity Example Severity of Heart Disease Relative F-ratio Validity None Mild Severe Scale #1 87 90 91 2 --- Scale #2 74 78 88 10 5 Scale #3 77 87 95 20 10 51 9/17/2020
Responsiveness to Change and Minimally Important Difference • HRQOL measures should be responsive to interventions that changes HRQOL • Evaluating responsiveness requires assessment of HRQOL – pre-post intervention of known efficacy – at two times in tandem with gold standard 52 9/17/2020
Two Essential Elements • External indicator of change (Anchors) - mean change in HRQOL scores among people who have a “minimal” change in HRQOL. • Amount of HRQOL change 53 9/17/2020
External Indicator of Change (A) Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Much worse; Moderately worse; Minimally worse 54 9/17/2020
External Indicator of Change (B) Rate your overall condition. This rating should encompass factors such as social activities, performance at work or school, seizures, alertness, and functional capacity; that is, your overall quality of life. 7 response categories; ranging from no impairment to extremely severe impairment 55 9/17/2020
External Indicator of Change (C) · “changed” group = seizure free (100% reduction in seizure frequency) · “unchanged” group = < 50% change in seizure frequency 56 9/17/2020
Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD† (3) Guyatt responsiveness statistic (RS) = D/SD‡ D SD SD† SD‡ = raw score change in “changed” group; = baseline SD; = SD of D among “unchanged” 57 9/17/2020
Effect Size Benchmarks • Small: 0. 20 ->0. 49 • Moderate: 0. 50>0. 79 • Large: 0. 80 or above 58 9/17/2020
Treatment Impact on PCS 59 9/17/2020
Treatment Impact on MCS 60 9/17/2020
61 9/17/2020