Evaluating SelfReport Data Using Psychometric Methods Ron D

Four Types of Data Collection Errors • Coverage Error Does each person in population

What’s a Good Measure? • Same person gets same score (reliability) • Different people

How Are Good Measures Developed? • Review literature • Expert input (patients and clinicians)

Scales of Measurement and Their Properties Property of Numbers Type of Scale Nominal Ordinal

Measurement Range for Health Outcome Measures Nominal Ordinal Interval Ratio 6 9/17/2020

Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale)

Variability • All scale levels are represented • Distribution approximates bell-shaped "normal" 8 9/17/2020

Measurement Error observed = true + score systematic + error random error (bias) 9

Flavors of Reliability • Test-retest (administrations) • Intra-rater (raters) • Internal consistency (items) 10

Test-retest Reliability of MMPI 317 -362 r = 0. 75 MMPI 317 True False

Kappa Coefficient of Agreement (Corrects for Chance) kappa = (observed - chance) (1 -

Example of Computing KAPPA 1 4 2 1 1 2 2 3 2 2

Example of Computing KAPPA (Continued) Pc = (1 x 2) + (3 x 2)

Guidelines for Interpreting Kappa Conclusion Poor Kappa <. 40 Conclusion Poor Kappa < 0.

Ratings of Height of Houseplants Baseline Height Plant A 1 A 2 B 1

Ratings of Height of Houseplants (Cont. ) Baseline Height Plant C 2 D 1

Reliability of Baseline Houseplant Ratings of Height of Plants: 10 plants, 2 raters Baseline

Sources of Variance in Baseline Houseplant Height Source Plants (N) dfs MS 9 628.

Intraclass Correlation and Reliability Model Reliability One-Way MS BMS - MS Intraclass Correlation MS

Summary of Reliability of Plant Ratings Baseline RTT RII One-Way Anova 0. 97 0.

Cronbach’s Alpha Source df Respondents (BMS) 4 Items (JMS) 1 Resp. x Items (EMS)

Alpha by Number of Items and Inter-item Correlations alphast = K = _ K

Alpha for Different Numbers of Items and Homogeneity Average Inter-item Correlation ( r )

Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI)

Spearman-Brown Prophecy Formula alpha y = ( N • alpha x 1 + (N

Reliability Minimum Standards • 0. 70 or above (for group comparisons) • 0. 90

Reliability of a Composite Score 28 9/17/2020

Hypothetical Multitrait/Multi-Item Correlation Matrix 29 9/17/2020

Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical 0. 67† 0. 54† 0. 41

What are IRT Models? Mathematical equations that relate observed survey responses to a persons

Latent Trait and Item Responses Item 1 Response Latent Trait Item 2 Response Item

IRT Model Assumptions • Unidimensionality - One construct measured by items in scale. •

Types of IRT Models • Unidimensional and multidimensional • Dichotomous and polytomous • Parameterization

Item difficulty Transform proportion of people endorsing the item (p) to correspond to (1

Item Discrimination • Item-scale correlation, corrected for item overlap - Z’ = ½ [ln

1 -Parameter Logistic Model for (Dichotomous Outcomes) 39 9/17/2020

Prop. of “Yes” Item Characteristic Curves (1 -Parameter Model) 40 9/17/2020

2 -Parameter Logistic Model (Dichotomous Outcomes) 41 9/17/2020

Item Characteristic Curves (2 -Parameter Model) 42 9/17/2020

Item Responses and Trait Levels Person 1 Item 1 Person 2 Item 2 Person

Information Conditional on Trait Level • Item information proportional to inverse of standard error:

Item Information (2 -parameter model) 45 9/17/2020

Linking Item Content to Trait Estimates 46 9/17/2020

Dichotomous Items Showing DIF (2 -Parameter Model) Hispanics Whites DIF – Location (Item 1)

Forms of Validity • Content • Criterion • Construct Validity 48 9/17/2020

Construct Validity • Does measure relate to other measures in ways consistent with hypotheses?

Relative Validity Analyses • Form of "known groups" validity • Relative sensitivity of measure

Relative Validity Example Severity of Heart Disease Relative F-ratio Validity None Mild Severe Scale

Responsiveness to Change and Minimally Important Difference • HRQOL measures should be responsive to

Two Essential Elements • External indicator of change (Anchors) - mean change in HRQOL

External Indicator of Change (A) Overall has there been any change in your asthma

External Indicator of Change (B) Rate your overall condition. This rating should encompass factors

External Indicator of Change (C) · “changed” group = seizure free (100% reduction in

Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) =

Effect Size Benchmarks • Small: 0. 20 ->0. 49 • Moderate: 0. 50>0. 79

Slides: 61

Download presentation

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, Ph. D. (hays@rand. org) February 5, 2003 (3: 00 -6: 00 pm) 1 9/17/2020

Four Types of Data Collection Errors • Coverage Error Does each person in population have an equal chance of selection? • Sampling Error Are only some members of the population sampled? • Nonresponse Error Do people in the sample who respond differ from those who do not? • Measurement Error Are inaccurate answers given to survey questions? 2 9/17/2020

What’s a Good Measure? • Same person gets same score (reliability) • Different people get different scores (validity) • People get scores you expect (validity) • It is practical to use (feasibility) 3 9/17/2020

How Are Good Measures Developed? • Review literature • Expert input (patients and clinicians) • Define constructs you are interested in • Draft items (item generation) • Pretest – Cognitive interviews – Field and pilot testing • Revise and test again • Translate/harmonize across languages 4 9/17/2020

Scales of Measurement and Their Properties Property of Numbers Type of Scale Nominal Ordinal Interval Ratio Rank Order No Yes Yes Equal Interval No No Yes Absolute 0 No No No Yes 5 9/17/2020

Measurement Range for Health Outcome Measures Nominal Ordinal Interval Ratio 6 9/17/2020

Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale) 7 9/17/2020

Variability • All scale levels are represented • Distribution approximates bell-shaped "normal" 8 9/17/2020

Measurement Error observed = true + score systematic + error random error (bias) 9 9/17/2020

Flavors of Reliability • Test-retest (administrations) • Intra-rater (raters) • Internal consistency (items) 10 9/17/2020

Test-retest Reliability of MMPI 317 -362 r = 0. 75 MMPI 317 True False True MMPI 362 169 15 184 False 21 95 116 190 110 I am more sensitive than most other people. 11 9/17/2020

Kappa Coefficient of Agreement (Corrects for Chance) kappa = (observed - chance) (1 - chance) 12 9/17/2020

Example of Computing KAPPA 1 4 2 1 1 2 2 3 2 2 4 5 Column Sum 5 Row Sum 1 2 Rater B Rater A 3 1 3 2 2 2 10 13 9/17/2020

Example of Computing KAPPA (Continued) Pc = (1 x 2) + (3 x 2) + (2 x 2) Pobs. = Kappa = 9 10 (10 x 10) = = 0. 20 0. 90 - 0. 20 1 - 0. 20 = 0. 87 14 9/17/2020

Guidelines for Interpreting Kappa Conclusion Poor Kappa <. 40 Conclusion Poor Kappa < 0. 0 Fair . 40 -. 59 Slight . 00 -. 20 Good . 60 -. 74 Fair . 21 -. 40 Excellent >. 74 Moderate . 41 -. 60 Substantial . 61 -. 80 Almost perfect. 81 - 1. 00 Fleiss (1981) Landis and Koch (1977) 15 9/17/2020

Ratings of Height of Houseplants Baseline Height Plant A 1 A 2 B 1 B 2 C 1 Follow-up Height Experimental Condition R 1 R 2 120 118 121 120 1 R 2 084 096 085 088 2 R 1 R 2 107 105 108 104 2 R 1 R 2 094 097 100 104 1 R 2 085 091 088 096 2 16 9/17/2020

Ratings of Height of Houseplants (Cont. ) Baseline Height Plant C 2 D 1 D 2 E 1 E 2 Follow-up Height Experimental Condition R 1 R 2 079 078 086 092 1 R 2 070 072 076 080 1 R 2 054 056 060 2 R 1 R 2 085 097 101 108 1 R 2 090 092 084 096 2 17 9/17/2020

Reliability of Baseline Houseplant Ratings of Height of Plants: 10 plants, 2 raters Baseline Results Source Plants Within Total DF SS MS 9 5658 628. 667 10 177 17. 700 Raters 1 57. 800 Raters x Plants 9 119. 2 13. 244 19 F 35. 52 5835 18 9/17/2020

Sources of Variance in Baseline Houseplant Height Source Plants (N) dfs MS 9 628. 67 10 17. 70 (WMS) Raters (K) 1 57. 80 (JMS) Raters x Plants 9 13. 24 (EMS) Within Total (BMS) 19 19 9/17/2020

Intraclass Correlation and Reliability Model Reliability One-Way MS BMS - MS Intraclass Correlation MS WMS MS BMS Two-Way Fixed MS BMS - MS WMS MS BMS + (K-1)MS WMS - MS EMS MS BMS - MS MS BMS MS EMS Two-Way N (MS BMS - MS EMS) Random NMS BMS +MS JMS - MS EMS + (K-1)MSEMS MS BMS - MS EMS MS BMS + (K-1)MS EMS + K(MS JMS - MS EMS )/N 20 9/17/2020

Summary of Reliability of Plant Ratings Baseline RTT RII One-Way Anova 0. 97 0. 95 0. 97 0. 94 Two-Way Random Effects 0. 97 0. 95 0. 97 0. 94 Two-Way Fixed Effects 0. 98 0. 96 0. 98 0. 97 Source Plants Within ICC (1, 1) = Raters X Plants ICC (2, 1) = ICC (3, 1) = Label BMS WMS JMS EMS Follow-up RTT RII Baseline MS 628. 667 17. 700 57. 800 13. 244 BMS - WMS BMS + (K - 1) * WMS BMS - EMS 21 9/17/2020

Cronbach’s Alpha Source df Respondents (BMS) 4 Items (JMS) 1 Resp. x Items (EMS) 4 Total Alpha = 9 SS 11. 6 0. 1 4. 4 MS 2. 9 0. 1 16. 1 2. 9 - 1. 1 = 1. 8 = 0. 62 2. 9 22 9/17/2020

Alpha by Number of Items and Inter-item Correlations alphast = K = _ K r _ 1 + (K - 1 ) r number of items in scale 23 9/17/2020

Alpha for Different Numbers of Items and Homogeneity Average Inter-item Correlation ( r ) Number of Items (K) 2 4 6 8 . 0. 000 . 2. 333. 500. 666 . 4 . 6 . 8 1. 0 . 572. 727. 800. 842 . 750. 857. 900. 924 . 889. 941. 960. 970 1. 000 24 9/17/2020

Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI) 25 9/17/2020

Spearman-Brown Prophecy Formula alpha y = ( N • alpha x 1 + (N - 1) * alpha x ) N = how much longer scale y is than scale x 26 9/17/2020

Reliability Minimum Standards • 0. 70 or above (for group comparisons) • 0. 90 or higher (for individual assessment) Ø SEM = SD (1 - reliability)1/2 27 9/17/2020

Reliability of a Composite Score 28 9/17/2020

Hypothetical Multitrait/Multi-Item Correlation Matrix 29 9/17/2020

Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical 0. 67† 0. 54† 0. 41 0. 53 0. 60† 0. 58† Interpersonal 0. 28 0. 50† 0. 44† 0. 56† 0. 57† 0. 68* 0. 58* 0. 65* 0. 57* 0. 62* 0. 48* 0. 63† 0. 61† 0. 67† 0. 60† 0. 58† 0. 46† Interpersonal 1 0. 25 0. 26 0. 16 0. 23 0. 24 0. 18 0. 19 0. 32 0. 18 0. 24 Communication 0. 66* 2 Financial 0. 63† 0. 55* 30. 48* 40. 59* 50. 55* 60. 59* 10. 58 20. 59† 30. 62† 40. 53† 50. 54 60. 48† Note – Standard error of correlation is 0. 03. Technical = satisfaction with technical quality. Interpersonal = satisfaction with the interpersonal aspects. Communication = satisfaction with communication. Financial = satisfaction with financial arrangements. *Item-scale correlations for hypothesized scales (corrected for item overlap). †Correlation within two standard errors of the correlation of the item with its hypothesized scale. 30 9/17/2020

31 9/17/2020

IRT 32 9/17/2020

What are IRT Models? Mathematical equations that relate observed survey responses to a persons location on an unobservable latent trait (i. e. , intelligence, patient satisfaction). 33 9/17/2020

Latent Trait and Item Responses Item 1 Response Latent Trait Item 2 Response Item 3 Response P(X 1=1) P(X 1=0) 1 0 P(X 2=1) P(X 2=0) 1 0 P(X 3=0) 0 P(X 3=1) P(X 3=2) 1 2 34 9/17/2020

IRT Model Assumptions • Unidimensionality - One construct measured by items in scale. • Local Independence - Items uncorrelated when latent trait(s) have been controlled for. 35 9/17/2020

Types of IRT Models • Unidimensional and multidimensional • Dichotomous and polytomous • Parameterization - One parameter: difficulty (location) - Two Parameter: difficulty and slope (discrimination) 36 9/17/2020

Item difficulty Transform proportion of people endorsing the item (p) to correspond to (1 -p)th percentile from z distribution Z = ln (1 -p)/p)/1. 7 = (ln (1 -p) – ln (p))/1. 7 = (ln (. 228) – ln (. 772))/1. 7 = (-1. 47840965 +. 258770729)/1. 7 = -1. 21963892/1. 7 = -0. 72 37 9/17/2020

Item Discrimination • Item-scale correlation, corrected for item overlap - Z’ = ½ [ln (1 + r) – ln (1 -r) ] - if r = 0. 30, z = 0. 31 - if r = 0. 80, z = 1. 10 - if r = 0. 95, z = 1. 83 (0. 5 -> 2 is typical range) 38 9/17/2020

1 -Parameter Logistic Model for (Dichotomous Outcomes) 39 9/17/2020

Prop. of “Yes” Item Characteristic Curves (1 -Parameter Model) 40 9/17/2020

2 -Parameter Logistic Model (Dichotomous Outcomes) 41 9/17/2020

Item Characteristic Curves (2 -Parameter Model) 42 9/17/2020

Item Responses and Trait Levels Person 1 Item 1 Person 2 Item 2 Person 3 Item 3 Trait Continuum 43 9/17/2020

Information Conditional on Trait Level • Item information proportional to inverse of standard error: • Scale/Test information is the sum over item information: 44 9/17/2020

Item Information (2 -parameter model) 45 9/17/2020

Linking Item Content to Trait Estimates 46 9/17/2020

Dichotomous Items Showing DIF (2 -Parameter Model) Hispanics Whites DIF – Location (Item 1) DIF – Slope (Item 2) Hispanics Whites 47 9/17/2020

Forms of Validity • Content • Criterion • Construct Validity 48 9/17/2020

Construct Validity • Does measure relate to other measures in ways consistent with hypotheses? • Responsiveness to change 49 9/17/2020

Relative Validity Analyses • Form of "known groups" validity • Relative sensitivity of measure to important clinical difference • One-way between group ANOVA 50 9/17/2020

Relative Validity Example Severity of Heart Disease Relative F-ratio Validity None Mild Severe Scale #1 87 90 91 2 --- Scale #2 74 78 88 10 5 Scale #3 77 87 95 20 10 51 9/17/2020

Responsiveness to Change and Minimally Important Difference • HRQOL measures should be responsive to interventions that changes HRQOL • Evaluating responsiveness requires assessment of HRQOL – pre-post intervention of known efficacy – at two times in tandem with gold standard 52 9/17/2020

Two Essential Elements • External indicator of change (Anchors) - mean change in HRQOL scores among people who have a “minimal” change in HRQOL. • Amount of HRQOL change 53 9/17/2020

External Indicator of Change (A) Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Much worse; Moderately worse; Minimally worse 54 9/17/2020

External Indicator of Change (B) Rate your overall condition. This rating should encompass factors such as social activities, performance at work or school, seizures, alertness, and functional capacity; that is, your overall quality of life. 7 response categories; ranging from no impairment to extremely severe impairment 55 9/17/2020

External Indicator of Change (C) · “changed” group = seizure free (100% reduction in seizure frequency) · “unchanged” group = < 50% change in seizure frequency 56 9/17/2020

Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD† (3) Guyatt responsiveness statistic (RS) = D/SD‡ D SD SD† SD‡ = raw score change in “changed” group; = baseline SD; = SD of D among “unchanged” 57 9/17/2020

Effect Size Benchmarks • Small: 0. 20 ->0. 49 • Moderate: 0. 50>0. 79 • Large: 0. 80 or above 58 9/17/2020

Treatment Impact on PCS 59 9/17/2020

Treatment Impact on MCS 60 9/17/2020

61 9/17/2020