Applied Psychometric Strategies Lab Applied Quantitative and Psychometric

Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, Ed. S Guiding Steps for the Evaluation or Creation of a Scale: A Starter Kit March 20, 2018

What is a “bad” scale? I saw in someone’s dissertation a newly published scale… It depends!

But, will bad measurement really hurt my study?

Yes! That’s why we are here. • Big Picture: Poor measurement is an ethical concern because If the measurement is problematic, the reliability of our findings is compromised …in other words… Our degree of trust in our results is in question, which means our statistical conclusion validity is in question!

Sample specific measurement challenges that may occur • Low reliability or large measurement error around scores • Uncertainty about using a total score for each person or subscale scores for each person • Not all response options are being used or low item response variability • Poor factor structure solutions due to crossloading issues on multiple factors, low loadings on factors, and/or influences such as item phrasing

Examples From Applied Work • Cognitive Flexibility Inventory for Dissertation – Only original paper explored factor structure – Cross loading issues on original – Poor recovery of factor structure in my sample – Had to reduce items for better fit, a controversial decision • Internal structure assessment of Objectified Body Consciousness Scale – Poor fit with trans women, indicating inappropriate to use without further study

Why does good measurement matter?

If you use a well established measure you will likely find the following • High reliability which will result in low measurement error – accurate effect size estimates (d, R 2) – better captures effects of interest (Beta, path coefficients) – improves inferential techniques (more accurate SEs and ultimately statistical decisions) • Confidence in how to score the scale (total and/or subscales) • All response options are being used • Strong recovery of factor structure solution with near zero cross-loadings across factors, high loadings on intended factors, and/or minimal influence due to method factors

Looking for a Scale? Guiding Steps for the Evaluation or Creation of a Scale 1. ) Evaluation of psychological scales - Should I use a scale I found? 2. ) Scale development – I can’t find a scale. What steps do I need to develop a scale to measure a psychological construct?

How do I know if my scale is “good? ”

Good scales… have ongoing and multiple sources of evidence that can be used to evaluate the validity of the interpretation of the scale for a particular use.

Sources of Validity in Instrument Development Evidence based on. . . Test content Am I measuring what I planned to measure?

Sources of Validity in Instrument Development Evidence based on. . . Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way?

Sources of Validity in Instrument Development Evidence based on. . . Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence.

Sources of Validity in Instrument Development Evidence based on. . . Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. Internal structure What is the degree to which the items on my scale are conforming to the construct and how I intend to interpret the scale?

Sources of Validity in Instrument Development Evidence based on. . . Test content Literature review, content specification, expert judges Reponses processes Cognitive interviews Relations to other variables Analysis of the relationship of the scale scores to variables external to the scale (correlational evidence) Internal structure Factor analysis, measurement invariance **See Standards for Educational and Psychological Testing

Determining If You Should Use An Instrument • Length and Content – Does the scale represent the breadth of the construct? • Reliability – Is the score reliability from your scale reasonable, using similar samples? * *Depends on the seriousness/specificity of the reliability issue…

Determining If You Should Use An Instrument • Previous Samples – Has the scale been used with samples similar to your sample of interest? • Intended Performance – Has the scale previously performed as intended based on review of past psychometric analyses? EFA, CFA, correlational, SEM

Determining If You Should Use An Instrument • Scoring – How has the scale been scored in the past? – Was sufficient testing done to evaluate appropriateness of using a total score, if needed?

Is one EFA okay? • When is it enough? – Consider whethere is psychometric evidence for your specific sample

Where can I find good scales?

Where To Find Instruments: Literature • Review the literature on your construct and scales that measure your construct, paying close attention to: – Definitions about the construct – Reliability – Factor Structure • Subscales vs. total scores • Exploratory Factor Analysis • Confirmatory Factor Analysis – Validation sample • Measurement invariance

Where To Find Instruments: Reviews • Mental Measurements Yearbook (MMY) – A tool to locate information about commercial tests and measures – Issues from 1938 -2017 – Provides factual information on published tests – Critical test reviews written by: • Professionals and psychometricians in education, psychology, speech/language/hearing, law, health care, and other related fields

Developing an Instrument If Needed

Best Practices In Instrument Development • Recognize Instrument Development as an ongoing process, not a one time event

Helpful References for Scale Construction American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. De. Vellis, R. F. (2012). Scale development: Theory and applications. Los Angeles, CA: Sage. Kline, P. (1986). Making tests reliable II: Personality inventories. In P. Kline (Ed), A Handbook of Test Construction: Introduction to psychometric design (pp. 59 -76). London, United Kingdom: Methuen. Thorndike, R. M. , & Thorndike-Christ, T. (2010). Measurement and evaluation in psychology and education. Boston, MA: Prentice Hall. Willis, G. B. & Artino, A. R. (2013). What do our respondents think we’re asking? Using cognitive interviewing to improve medical education surveys. Journal of Graduate Medical Education, 5, 353 -356. doi: 10. 4300/JGME-D-13 -00154. 1

What things did we want to get into, but did not have time to do so? • • Best practices in CFA and EFA Bifactor Analyses SEM Using IRT Measurement Invariance or DIF Cognitive diagnostic models Multilevel SEM and IRT

What did I learn? • Think twice before using a scale to measure a construct of interest • Good measurement matters • Instrument development is an ongoing process • Consider gathering psychometric evidence to support the intended use of the scale within your study