Evaluating SelfReport Data Using Psychometric Methods Ron D

  • Slides: 55
Download presentation
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, Ph. D (hays@rand. org) February

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, Ph. D (hays@rand. org) February 8, 2006 (3: 00 -6: 00 pm) HS 249 F

Individual Change • Interest in knowing how many patients benefit from group intervention or

Individual Change • Interest in knowing how many patients benefit from group intervention or • Tracking progress on individual patients • Sample – 54 patients – Average = 56; 84% white; 58% female • Method – Self-administered SF-36 version 2 at baseline and about at end of therapy (about 6 weeks later). HEALTH

Physical Functioning and Emotional Well-Being at Baseline for 54 Patients at UCLA-Center for East

Physical Functioning and Emotional Well-Being at Baseline for 54 Patients at UCLA-Center for East West Medicine HEALTH Hays et al. (2000), American Journal of Medicine

Change in SF-36 Scores Over Time Effect Size 0. 13 HEALTH 0. 35 0.

Change in SF-36 Scores Over Time Effect Size 0. 13 HEALTH 0. 35 0. 21 0. 53 0. 36 0. 11 0. 41 0. 24 0. 30

t-test for within group change • XD/(SDd/n ½) XD = is mean difference, SDd

t-test for within group change • XD/(SDd/n ½) XD = is mean difference, SDd = standard deviation of difference HEALTH

Significance of Group Change HEALTH Delta t-test prob. PF-10 1. 7 2. 38 .

Significance of Group Change HEALTH Delta t-test prob. PF-10 1. 7 2. 38 . 0208 RP-4 4. 1 3. 81 . 0004 BP-2 3. 6 2. 59 . 0125 GH-5 2. 4 2. 86 . 0061 EN-4 5. 1 4. 33 . 0001 SF-2 4. 7 3. 51 . 0009 RE-3 1. 5 0. 96 . 3400 <- EWB-5 4. 3 3. 20 . 0023 PCS 2. 8 3. 23 . 0021 MCS 3. 9 2. 82 . 0067

Reliable Change Index • (X 2 – X 1)/ (SEM * SQRT [2]) •

Reliable Change Index • (X 2 – X 1)/ (SEM * SQRT [2]) • SEM = SDb * (1 - reliability)1/2 HEALTH

Amount of Change in Observed Score Needed for Significant Change RCI Effect size PF-10

Amount of Change in Observed Score Needed for Significant Change RCI Effect size PF-10 8. 4 0. 67 RP-4 8. 4 0. 72 BP-2 10. 4 1. 01 GH-5 13. 0 1. 13 EN-4 12. 8 1. 33 SF-2 13. 8 1. 07 RE-3 9. 7 EWB-5 13. 4 0. 71 1. 26 PCS 7. 1 0. 62 MCS 9. 7 0. 73 HEALTH

Change for 54 Cases HEALTH % % Improving Declining Difference PF-10 13% 2% 11%

Change for 54 Cases HEALTH % % Improving Declining Difference PF-10 13% 2% 11% RP-4 31% 2% 29% BP-2 22% 7% 15% GH-5 7% 0% 7% EN-4 9% 2% 7% SF-2 17% 4% 13% RE-3 15% 0% EWB-5 19% 4% 15% PCS 24% 7% 17% MCS 22% 11%

How Are Good Measures Developed? • Review literature • Expert input (patients and clinicians)

How Are Good Measures Developed? • Review literature • Expert input (patients and clinicians) • Define constructs you are interested in • Draft items (item generation) • Pretest – Cognitive interviews – Field and pilot testing • Revise and test again • Translate/harmonize across languages HEALTH

What’s a Good Measure? • Same person gets same score (reliability) • Different people

What’s a Good Measure? • Same person gets same score (reliability) • Different people get different scores (validity) • People get scores you expect (validity) • It is practical to use (feasibility) HEALTH

Scales of Measurement and Their Properties Property of Numbers Type of Scale Nominal Ordinal

Scales of Measurement and Their Properties Property of Numbers Type of Scale Nominal Ordinal Interval Ratio HEALTH Rank Order Equal Interval Absolute 0 No Yes Yes No No No Yes

Measurement Range for Health Outcome Measures Nominal HEALTH Ordinal Interval Ratio

Measurement Range for Health Outcome Measures Nominal HEALTH Ordinal Interval Ratio

Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale)

Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale) HEALTH

Variability • All scale levels are represented • Distribution approximates bell-shaped "normal" HEALTH

Variability • All scale levels are represented • Distribution approximates bell-shaped "normal" HEALTH

Measurement Error observed = true + systematic + score error (bias) HEALTH random error

Measurement Error observed = true + systematic + score error (bias) HEALTH random error

Four Types of Data Collection Errors • Coverage Error Does each person in population

Four Types of Data Collection Errors • Coverage Error Does each person in population have an equal chance of selection? • Sampling Error Are only some members of the population sampled? • Nonresponse Error Do people in the sample who respond differ from those who do not? • Measurement Error Are inaccurate answers given to survey questions? HEALTH

Flavors of Reliability • Test-retest (administrations) • Intra-rater (raters) • Internal consistency (items) HEALTH

Flavors of Reliability • Test-retest (administrations) • Intra-rater (raters) • Internal consistency (items) HEALTH

Test-retest Reliability of MMPI 317 -362 r = 0. 75 MMPI 317 True False

Test-retest Reliability of MMPI 317 -362 r = 0. 75 MMPI 317 True False True MMPI 362 169 15 184 False 21 95 116 190 110 I am more sensitive than most other people. HEALTH

Kappa Coefficient of Agreement (Corrects for Chance) kappa = HEALTH (observed - chance) (1

Kappa Coefficient of Agreement (Corrects for Chance) kappa = HEALTH (observed - chance) (1 - chance)

Example of Computing KAPPA 1 4 2 1 1 2 2 3 2 2

Example of Computing KAPPA 1 4 2 1 1 2 2 3 2 2 4 5 Column Sum HEALTH 5 Row Sum 1 2 Rater B Rater A 3 1 3 2 2 2 10

Example of Computing KAPPA (Continued) Pc = (1 x 2) + (3 x 2)

Example of Computing KAPPA (Continued) Pc = (1 x 2) + (3 x 2) + (2 x 2) Pobs. = Kappa = HEALTH 9 10 (10 x 10) = 0. 90 - 0. 20 1 - 0. 20 = 0. 87 = 0. 20

Guidelines for Interpreting Kappa Conclusion Poor Kappa <. 40 Conclusion Poor Kappa < 0.

Guidelines for Interpreting Kappa Conclusion Poor Kappa <. 40 Conclusion Poor Kappa < 0. 0 Fair . 40 -. 59 Slight . 00 -. 20 Good . 60 -. 74 Fair . 21 -. 40 Excellent >. 74 Moderate . 41 -. 60 Substantial . 61 -. 80 Almost perfect. 81 - 1. 00 Fleiss (1981) HEALTH Landis and Koch (1977)

Ratings of Height of Houseplants Baseline Height Plant A 1 A 2 B 1

Ratings of Height of Houseplants Baseline Height Plant A 1 A 2 B 1 B 2 C 1 HEALTH Follow-up Height Experimental Condition R 1 R 2 120 118 121 120 1 R 2 084 096 085 088 2 R 1 R 2 107 105 108 104 2 R 1 R 2 094 097 100 104 1 R 2 085 091 088 096 2

Ratings of Height of Houseplants (Cont. ) Baseline Height Plant C 2 D 1

Ratings of Height of Houseplants (Cont. ) Baseline Height Plant C 2 D 1 D 2 E 1 E 2 HEALTH Follow-up Height Experimental Condition R 1 R 2 079 078 086 092 1 R 2 070 072 076 080 1 R 2 054 056 060 2 R 1 R 2 085 097 101 108 1 R 2 090 092 084 096 2

Reliability of Baseline Houseplant Ratings of Height of Plants: 10 plants, 2 raters Baseline

Reliability of Baseline Houseplant Ratings of Height of Plants: 10 plants, 2 raters Baseline Results Source Plants Within DF SS MS 9 5658 628. 667 10 177 17. 700 Raters 1 57. 800 Raters x Plants 9 119. 2 13. 244 Total HEALTH 19 5835 F 35. 52

Sources of Variance in Baseline Houseplant Height Source Plants (N) MS 9 628. 67

Sources of Variance in Baseline Houseplant Height Source Plants (N) MS 9 628. 67 10 17. 70 (WMS) Raters (K) 1 57. 80 (JMS) Raters x Plants 9 13. 24 (EMS) Within Total HEALTH dfs 19 (BMS)

Intraclass Correlation and Reliability Model Reliability One-Way MS BMS - MS WMS MS BMS

Intraclass Correlation and Reliability Model Reliability One-Way MS BMS - MS WMS MS BMS - MS MS BMS + Two-Way MS BMS Fixed MS BMS Two-Way N (MSBMS Random NMSBMS HEALTH Intraclass Correlation - MS EMS - MSEMS ) - MSEMS (K-1)MS WMS MSBMS - MS EMS MSEMS +MSJMS WMS + (K-1)MS EMS MSBMS MS BMS + (K-1)MS - MS EMS + K(MS JMS - MS EMS )/N

Summary of Reliability of Plant Ratings Baseline One-Way Anova 0. 97 Two-Way Random Effects

Summary of Reliability of Plant Ratings Baseline One-Way Anova 0. 97 Two-Way Random Effects 0. 97 Two-Way Fixed Effects 0. 98 Source Plants Within Raters X Plants HEALTH RTT 0. 97 0. 94 0. 98 0. 97 Label BMS WMS JMS EMS Follow-up RII 0. 95 RTT 0. 95 0. 96 Baseline MS 628. 667 17. 700 57. 800 13. 244 RII

Source Cronbach’s Alpha Respondents (BMS) Items (JMS) Resp. x Items (EMS) Total Alpha =

Source Cronbach’s Alpha Respondents (BMS) Items (JMS) Resp. x Items (EMS) Total Alpha = HEALTH df SS MS 4 11. 6 0. 1 4. 4 2. 9 0. 1 1. 1 9 16. 1 2. 9 - 1. 1 = 1. 8 = 0. 62 2. 9

Alpha by Number of Items and Inter-item Correlations alphast = K HEALTH = _

Alpha by Number of Items and Inter-item Correlations alphast = K HEALTH = _ K r _ 1 + (K - 1 ) r number of items in scale

Alpha for Different Numbers of Items and Homogeneity Average Inter-item Correlation ( r )

Alpha for Different Numbers of Items and Homogeneity Average Inter-item Correlation ( r ) Number of Items (K) 2 4 6 8 HEALTH . 0. 000 . 2. 333. 500. 666 . 4 . 6 . 8 1. 0 . 572. 727. 800. 842 . 750. 857. 900. 924 . 889. 941. 960. 970 1. 000

Spearman-Brown Prophecy Formula alpha y = ( N • alpha x 1 + (N

Spearman-Brown Prophecy Formula alpha y = ( N • alpha x 1 + (N - 1) * alpha x ) N = how much longer scale y is than scale x HEALTH

Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI)

Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI) HEALTH

Example Spearman-Brown Calculations MHI-18 18/32 (0. 98) (1+(18/32 – 1)*0. 98 = 0. 55125/0.

Example Spearman-Brown Calculations MHI-18 18/32 (0. 98) (1+(18/32 – 1)*0. 98 = 0. 55125/0. 57125 = 0. 96 HEALTH

Reliability Minimum Standards • 0. 70 or above (for group comparisons) • 0. 90

Reliability Minimum Standards • 0. 70 or above (for group comparisons) • 0. 90 or higher (for individual assessment) Ø SEM = SD (1 - reliability)1/2 HEALTH

Reliability of a Composite Score HEALTH

Reliability of a Composite Score HEALTH

Hypothetical Multitrait/Multi-Item Correlation Matrix HEALTH

Hypothetical Multitrait/Multi-Item Correlation Matrix HEALTH

Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical 0. 67† 0. 54† 0. 41

Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical 0. 67† 0. 54† 0. 41 0. 53 0. 60† 0. 58† Interpersonal 0. 28 0. 50† 0. 44† 0. 56† 0. 57† 0. 68* 0. 58* 0. 65* 0. 57* 0. 62* 0. 48* 0. 63† 0. 61† 0. 67† 0. 60† 0. 58† 0. 46† Interpersonal 1 0. 25 0. 26 0. 16 0. 23 0. 24 0. 18 0. 19 0. 32 0. 18 0. 24 Communication 0. 66* 2 Financial 0. 63† 0. 55* 30. 48* 40. 59* 50. 55* 60. 59* 10. 58 20. 59† 30. 62† 40. 53† 50. 54 60. 48† Note – Standard error of correlation is 0. 03. Technical = satisfaction with technical quality. Interpersonal = satisfaction with the interpersonal aspects. Communication = satisfaction with communication. Financial = satisfaction with financial arrangements. *Item-scale correlations for hypothesized scales (corrected for item overlap). †Correlation within two standard errors of the correlation of the item with its hypothesized scale. HEALTH

Construct Validity • Does measure relate to other measures in ways consistent with hypotheses?

Construct Validity • Does measure relate to other measures in ways consistent with hypotheses? • Responsiveness to change including minimally important difference HEALTH

Construct Validity for Scales Measuring Physical Functioning Severity of Heart Disease Relative F-ratio Validity

Construct Validity for Scales Measuring Physical Functioning Severity of Heart Disease Relative F-ratio Validity None Mild Severe Scale #1 91 90 87 2 --- Scale #2 88 78 74 10 5 Scale #3 95 87 77 20 10 HEALTH

Responsiveness to Change and Minimally Important Difference (MID) • HRQOL measures should be responsive

Responsiveness to Change and Minimally Important Difference (MID) • HRQOL measures should be responsive to interventions that changes HRQOL • Need external indicators of change (Anchors) – mean change in HRQOL scores among people who have changed (“minimal” change for MID). HEALTH

Self-Report Indicator of Change • Overall has there been any change in your asthma

Self-Report Indicator of Change • Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Much worse; Moderately worse; Minimally worse HEALTH

Clinical Indicator of Change – “changed” group = seizure free (100% reduction in seizure

Clinical Indicator of Change – “changed” group = seizure free (100% reduction in seizure frequency) – “unchanged” group = <50% change in seizure frequency HEALTH

Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) =

Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD† (3) Guyatt responsiveness statistic (RS) = D/SD‡ D SD SD† SD‡ HEALTH = raw score change in “changed” group; = baseline SD; = SD of D among “unchanged”

Effect Size Benchmarks • Small: 0. 20 ->0. 49 • Moderate: 0. 50 ->0.

Effect Size Benchmarks • Small: 0. 20 ->0. 49 • Moderate: 0. 50 ->0. 79 • Large: 0. 80 or above HEALTH

Treatment Impact on PCS HEALTH

Treatment Impact on PCS HEALTH

Treatment Impact on MCS HEALTH

Treatment Impact on MCS HEALTH

IRT HEALTH

IRT HEALTH

Latent Trait and Item Responses Item 1 Response Latent Trait Item 2 Response Item

Latent Trait and Item Responses Item 1 Response Latent Trait Item 2 Response Item 3 Response HEALTH P(X 1=1) P(X 1=0) 1 0 P(X 2=1) P(X 2=0) 1 0 P(X 3=0) 0 P(X 3=1) P(X 3=2) 1 2

Item Responses and Trait Levels Person 1 Item 1 HEALTH Person 2 Item 2

Item Responses and Trait Levels Person 1 Item 1 HEALTH Person 2 Item 2 Person 3 Item 3 Trait Continuum

Prob. of “Yes” Item Characteristic Curves (1 -Parameter Model) HEALTH

Prob. of “Yes” Item Characteristic Curves (1 -Parameter Model) HEALTH

Item Characteristic Curves (2 -Parameter Model) HEALTH

Item Characteristic Curves (2 -Parameter Model) HEALTH

Dichotomous Items Showing DIF (2 -Parameter Model) Hispanics Whites DIF – Location (Item 1)

Dichotomous Items Showing DIF (2 -Parameter Model) Hispanics Whites DIF – Location (Item 1) DIF – Slope (Item 2) Hispanics Whites HEALTH

HEALTH

HEALTH