A bootstrap method for estimators based on combined
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019
Dutch Virtual Census admin. data number of cases (2011 Census) 0 -14 year olds ± 2. 9 million admin. data ± 6. 5 million LFS ? ± 0. 34 million ± 6. 9 million ± 16. 7 million 2
Dutch Virtual Census other variables 1 … h … H level 1 … level c … level C 3
Dutch Virtual Census 4
General set-up ? 5
Bootstrap 6
Bootstrap • Classical bootstrap does not account for • • Finite-population sampling Complex survey design • Different extensions of the bootstrap available • Overview: Mashreghi, Haziza and Léger (2016) • Here: extension based on pseudo-populations • • Theory: Booth, Butler and Hall (1994), Chauvet (2007) Previous application: Kuijvenhoven and Scholtus (2011) 7
Bootstrap 8
Bootstrap 9
Bootstrap 10
Bootstrap 11
Simulation study age (years) young (15– 35) middle (36– 55) old (56+) true counts educational attainment low medium high 330 795 400 115 560 480 120 525 400 true standard deviations educational attainment low medium high 34. 5 42. 2 36. 8 22. 3 36. 8 36. 1 22. 8 35. 6 34. 5 12
Simulation study age (years) young (15– 35) middle (36– 55) old (56+) true counts educational attainment low medium high 330 795 400 115 560 480 120 525 400 true standard deviations educational attainment low medium high 34. 5 42. 2 36. 8 22. 3 36. 8 36. 1 22. 8 35. 6 34. 5 age (years) young (15– 35) middle (36– 55) old (56+) estimated analytical st. dev. educational attainment low medium high 32. 2 39. 5 34. 5 20. 6 34. 0 33. 3 21. 1 32. 8 31. 8 estimated bootstrap st. dev. educational attainment low medium high 34. 1 41. 9 36. 4 22. 7 36. 6 36. 0 22. 5 35. 2 34. 5 13
Conclusion • Bootstrap method for estimating accuracy of statistics based on combined administrative and survey data • Advantage over analytical variance estimation: flexibility • Possible disadvantage: computational workload • Future work: • • Simulation study with real Dutch Census data (in progress) Extending method to account for additional sources of uncertainty: • Micro-integration of survey and admin. data in overlapping part • Measurement error • … 14
References J. G. Booth, R. W. Butler, and P. Hall (1994), Bootstrap Methods for Finite Populations. Journal of the American Statistical Association 89, 1282– 1289. G. Chauvet (2007), Méthodes de Bootstrap en Population Finie. Ph. D Thesis (in French), L’Université de Rennes. T. de Waal and J. Daalmans (2018), Mass Imputation for Census Estimation: Methodology. Report, Statistics Netherlands. B. Efron (1979), Bootstrap methods: another look at the jack-knife. The Annals of Statistics 7, 1– 26. L. Kuijvenhoven and S. Scholtus (2011), Bootstrapping Combined Estimators based on Register and Sample Survey Data. Discussion Paper, Statistics Netherlands. Z. Mashreghi, D. Haziza, and C. Léger (2016), A Survey of Bootstrap Methods in Finite Population Sampling. Statistics Surveys 10, 1– 52. S. Scholtus (2018), Variances of Census Tables after Mass Imputation. Discussion Paper, Statistics Netherlands. 15
- Slides: 15