Software for IRT and Plausible Value Imputations Matthias

Software for IRT and Plausible Value Imputations Matthias von Davier

Software for IRT models • This is a vast and growing field! – Special purpose software (free or $$$) • Faster, well tested, used in operational analyses – General purpose software, IRT (R) packages • Slow(ish), some are well tested, some just developed, then abandoned – Tweaking existing packages to estimate IRT and IRT model extensions, or DIY programs • e. g. Win. BUGS, JAGS, STAN, or Python scripts • This topic could fill one day or more

My IRT Software Development Path (since 1990): • • Lacord, Polyra (1990 -1993, Fortran 77) Winmira (1994, GFA Basic), Winmira 2001 (1997 -2004, Delphi) mgroup/ygroup/mcmcgroup/saemgroup (2000 -2010, Fortran) • mdltm (since 2005, Ansi C) • RPCM (2017, Python 2. 7) • Extended Ising Models (2018, Python 3. 6)

My IRT Software Development Path (since 1990):

Special Purpose IRT Software

mdltm: Used in PISA and PIAAC

mdltm: Used in PISA and PIAAC • • • IRT: Rasch, polytomous Rasch, 2 PL, GPCM Mixture IRT Latent Class Models, located latent class models Cognitive Diagnostic Models (CDMs) Multidimensional IRT Models Multilevel and Mixture MIRT and CDMs Multiple-population models Global model fit, item fit, person fit EAP, MLE, WLE for person ability estimates …

mdltm: Used in PISA and PIAAC • • • von Davier, M. (2016), High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. ETS Research Report Series, 2016: 1– 11. doi: 10. 1002/ets 2. 12120 von Davier M. (2017) New Results on an Improved Parallel EM Algorithm for Estimating Generalized Latent Variable Models. In: van der Ark L. , Wiberg M. , Culpepper S. , Douglas J. , Wang WC. (eds) Quantitative Psychology. IMPS 2016. Springer Proceedings in Mathematics & Statistics, vol 196. Springer von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, Vol. 61, No. 2. (November), pp. 287 -307. https: //doi. org/10. 1348/000711007 X 193957 von Davier, M, Yamamoto, K. , Shin, H. -J. , Chen, H. , Khorramdel, L. , Weeks, J. , Davis, S. Kong, N. Kandathil, M. (2019) Evaluating item response theory linking and model fit for data from PISA 2000– 2012, Assessment in Education: Principles, Policy & Practice, DOI: 10. 1080/0969594 X. 2019. 1586642 Shin, H. J. , Khorramdel, L. & von Davier, M. (2019). GDM Software MDLTM Including Parallel EM Algorithm. Chapter 30 in: von Davier, M. & Lee, Y-S. (eds. ): Handbook Diagnostic Classification Models. Springer: New York.

Some R packages for IRT • • e. Rm (Meir, Hatzinger) m. Rm : mixture Rasch (Preinerstorfer), TAM (Wu, Ping, Robitzsch, et al. ), MLIRT (Fox), LNIRT (Fox), MIRT (Chalmers), LME 4 (Bates, Maechler et al. ), …

Stata support for IRT • Stata IRT module – https: //www. stata. com/manuals/irt. pdf • Gllamm (Skrondal, Rabe-Hesketh) – http: //www. gllamm. org/faqs/models/irtfitb. html • Rasch. Test (Hardouin) – https: //www. statajournal. com/article. html? article=st 0119 • Stata. Stan for IRT – https: //arxiv. org/pdf/1601. 03443. pdf

Software for Extended IRT • Lme 4 – Explanatory IRT models (De. Boeck & Wilson etc) • Py. Stan / RStan – Several papers and STAN manual give IRT examples – Uses Hamiltonian Monte Carlo & NUTS sampler (Gelman et al. ) – Can be used to estimate IRT as well as extended IRT & speed, e. g. : • Engagement, speed & ability model (Ulitzsch, von Davier & Pohl ‘ 19) – Needs to be ‘programmed’ but standard IRT STAN scripts exist • https: //mc-stan. org/docs/2_19/stan-users-guide/item-response-models-section. html • Slow(ish): fully Bayesian approach / exploring posteriors • Very flexible, new models can be developed “easily” • Later implementation using ML framework seem promising

Generating Plausible Values • Most IRT software produce point estimates – JMLE (only good for very long tests) – MML and then EAP, WLE, or MLE – CML (Rasch or OPLM only) and then EAP… • Plausible values are a different animal – Not ideal(!) for anything, but good for many things, unless ‘outside’ variables are used – PVs are imputations from the posterior distribution of proficiency, given responses and covariates

Generating Plausible Values • Occasionally, the above also allow generation of PVs, but typically not at the same level of complexity of background information. • Some functionality exists in: – TAM – Dexter – Mplus – Mice. Adds (addition to MICE) • Several R packages can use PVs, however…

Generating Plausible Values • Software selection is much smaller, if criterion is having been used operationally or including large numbers of background variables: • MGROUP (Mislevy & Sheehan, 1992) – CGROUP (Thomas, 1993) – YGROUP (von Davier, 2004) – MCEMGROUP (von Davier & Sinharay, 2007) – SAEMGROUP (von Davier & Sinharay, 2010) • Conquest (Adams, Wu, …)

IRT and Generating PVs • Summary: – Complex IRT calibrations need very thorough QC * just like all statistical modeling / estimation * – PVs are the products of a complex imputation model. No single model is ‘right’ for all purposes – Point estimates (and posterior variance or measurement error) can be generated by most IRT software packages – Best to build a custom model containing all needed variables – challenging for practitioners