Sampling strategy for the dualsystem correction of the

  • Slides: 28
Download presentation
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census n. Loredana Di Consiglio, Marco Fortini, Stefano Falorsi n. ISTAT Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Outline n Purpose: to plan a sampling strategy taking into account for municipal undercoverage of next Italian Census round n Sketch of 2011 Italian Census n Sources of data useful in planning Post Enumeration Survey (PES) n Sampling strategies considered for comparison n Construction of a fictitious, but plausible, population for simulations of sampling universe n Results of simulation study Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Key innovations of the 2011 Italian census n From traditional enumeration method… Search for households and people on the field n … to a register-supported census Municipal population registers so to mail out questionnaires to people Data collection method based on web, mail back and municipal data collection centres Reduction of the number of enumerators Data collection from late respondents Coverage evaluation activities Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Coverage evaluation program n Requested by Eurostat quality report, it is anyhow crucial in this context of extensive process and methods innovations n Over-coverage: people no more living in the municipality who are still enlisted into the population registers Checked by interviewers during contact of late-respondents n Under-coverage: people living in the municipality being not yet enlisted in population registers q Supplemental lists of people q Extensive search on the field q Statistical estimation based on capture-recapture techniques Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Overview of Italian census undercount n Gross undercoverage of population registers Estimated by Fortini and Gallo (2009) in about 400, 000 people (up to 560, 000) through administrative data and mixture model analysis to account for underreporting in the source n Gross undercoverage of 2001 Census (enumeration based) 2001 Post Enumeration Survey estimates that about 800, 000 people were missed n Both estimates are based on strong assumptions n However, this evidence makes reasonable the use of municipal population registers as the main source for households enumeration Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Capture-Recapture Approach n Correction for population register undercount through a second source based on independent field enumeration x 1+ people enlisted into municipal register estimate of municipal population based on field enumeration survey in a sample or enumeration areas (EAs) estimate of people that would have been counted by both the sources if field enumeration had carried out on the whole municipal area n Petersen estimator of the hidden population is (Wolter, 1986) Main goal: municipality estimates of population counts Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Sampling design for the 2011 Post-Enumeration Survey n About 1300 municipalities and 1, 200, 000 people will be sampled n Two alternative two-stage sampling design with municipalities and enumeration areas as primary and secondary sampling units Design A - region by class of population size (less than 5000, 5000 -20000, 20000 -50000, more than 50000) Design B - aggregation of provinces inside region by the 4 classes of population size (help in reducing bias of SAE) n Stratification and selection of municipalities according to their population size is considered for both designs n It is necessary to sample among municipalities in order to control costs Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Estimators n Direct estimates of census counts are available only at planned domain level small area estimation methods are needed at least for municipalities not included in the sample n Possible available predictors at area level modelling Population counts coming from register Demographic indicators (e. g. dependency ratios) Socio economic indicators n In what follows we consider q Direct estimation at regional level (Planned domains) q Synthetic estimator at municipality level Assumption of invariance among municipal under-coverage rates at planned domain level Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Direct Estimators Simple Expansion estimators Inverse of the selection probability Calibrated Expansion estimators Final weight Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Synthetic Estimator n Based on invariance assumption of under-coverage rates for municipalities belonging to the same planned domain For each system of weights, the coverage ratio is computed at domain level From the ratios, simple and calibrated synthetic estimators are obtained for municipalities Simple Calibrated Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Empirical study n It is based on simulation study n Two pseudo-populations of 335, 643 Italian EAs were considered n Sources of information 2001 Italian Post Enumeration Census Administrative data on changes of residence occurred after 2001 census (from November 2002 to December 2005) n For every non empty EAs belonging to the 8101 Italian municipalities, the following counts were generated q Observed count from population register (X 1+) q True (N) population count q Field enumeration count (X+1) q Count of people enumerated by both the sources (X 11 ) Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assemble the Pseudo-population For each Municipality Munic. Id EA Id 1015 1 535 1015 2 37 1015 3 53 1015 4 40 1015 5 4 1015 6 64 1015 7 13 Tot. True N P. Reg Survey Both EA Population register counts come from 2001 Census counts 746 Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assign True population counts to municipality For each Municipality Munic. EA 1015 1 535 1015 2 37 1015 3 53 1015 4 40 1015 5 4 1015 6 64 1015 7 13 Tot. True. N P. Reg 755 Survey Both EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 746 1/r Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assign True population counts to EAs For each Municipality Munic. EA 1015 1 538 535 1015 2 37 37 1015 3 58 53 1015 4 40 40 1015 5 4 4 1015 6 65 64 1015 7 13 13 755 746 Tot. True. N P. Reg Survey Both EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r True N is allocated between EAs by hierarchical Dirichlet/Multinomial model with parameter vector p given by distribution of P. Reg population among EAs Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assign survey counts to EAs Each Municipality Munic. EA 1015 1 538 1015 2 37 1015 3 58 53 1015 4 40 40 1015 5 4 4 1015 6 65 64 mean and variance of 2001 1015 7 13 13 PES coverage rates is 755 746 Tot. True N P. Reg rs Survey 535 37 536 Both EA Survey counts – True N multiplied by coverage rate ‘rs’ from beta - binomial distribution “alpha” and “beta” such that reproduced (5 macro regions by 4 classes of munic. pop. size) Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assign survey counts to municipality Each Municipality Munic. EA True N P. Reg 1015 1 538 535 536 1015 2 37 37 37 1015 3 58 53 58 1015 4 40 40 39 1015 5 4 4 4 1015 6 65 64 65 1015 7 13 13 13 755 746 752 Tot. Survey Both Municipal count is obtained summing up value of the EAs Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assign number of people enumerated by both the lists Each Municipality Munic. EA 1015 1 538 535 536 1015 2 37 37 37 1015 3 58 53 58 1015 4 40 40 39 1015 5 4 4 4 1015 6 65 64 65 1015 7 13 13 13 755 746 752 Tot. True. N P. Reg Survey Both 533 People enumerated by both lists: Hypergeometric distribution at EA level with parameters True N, P. Reg, Survey Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Assign number of people enumerated by both the lists Each Municipality Munic. EA 1015 1 538 535 536 533 1015 2 37 37 1015 3 58 53 1015 4 40 40 39 39 1015 5 4 4 1015 6 65 64 1015 7 13 13 755 746 752 743 Tot. True. N P. Reg Survey Both Municipal count is obtained summing up EAs Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census St. dev. of coverage rates among municipalities n About 400, 000 and 900, 000 missing people were generated for pseudo. Register and pseudo-Survey respectively n Population register variability is larger for POP 2 than for POP 1 n Survey variability is larger than its respective Population register variability (because of its lower coverage rate) n Survey variability is not so close to PES variability, even though their order of magnitude is the same Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Variability of coverage rates among EAs – Population registers Pseudo-coverage of the register vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too many points here n Simulated EAs show too many large units with very small coverage rate, which seems not realistic in our context Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Variability of coverage rates among EAs – Control survey Pseudo-coverage of survey vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too few points here n Simulated EAs show too few small units with small coverage rate in this case Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Simulation of the sampling space n Four tests: designs A and B for populations 1 and 2 n Each simulation is based on 500 sample replications n Sampling of municipalities with probability proportional to their population size n Simple random sampling of EAs within municipalities n Simple and weighted direct estimation at domain level n Synthetic estimation at municipality level n Population counts coming from population registers are used here as benchmark for comparisons downwards biased but available at zero cost of achievement Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Results – Bias of registers vs. synthetic estimates n Main results q q q Direct estimates have good performance in terms of bias and MSE at domain level Calibrated estimates overcome the simple ones in terms of MSE, both for direct and synthetic estimators The less-aggregate design B does not significantly improve the estimates, so only design A is shown here n In terms of bias, synthetic estimator improves registers. Improvements decrease for larger municipalities. This results are more evident for population 1 than for population 2 n In terms of maximum bias the improvement is not so noticeable Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Bias of synthetic estimator vs register counts Population 1 - design A by class of municipality size Less than 5, 000 –the 19, 000 Bisectors delimit zone where synthetic estimates are better than simple register counts in term of bias Synthetic estimator almost always improve registers in terms of bias q However, the improvement does not seem 20, 000 – 49, 000 50, 000 and more so prominent q Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Bias of synthetic estimator vs register count Population 2 - design A by class of municipality size Same conclusion for POP 2 with worst results for larger municipalities q Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Results – MSE of synthetic and direct estimators n Direct estimator can be applied to self-representative municipalities It is reported in the table for the two classes of larger municipalities n On average, synthetic estimator overcome the direct, which seems not useful even in sampled municipalies n MSE of synthetic estimates is much larger than Bias (in Table 2) Since in real cases this does not happen, this could be an evidence of a too high variability of pseudo-populations at level of EAs Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the undercoverage in the Register Supported 2011 Italian Population Census Difference between synthetic and direct estimator in terms of MSE – municipalities larger than 50, 000 inh. n The most part of municipalities larger than 50, 000 inh. show better Synthetic MSE (negative values) n Direct and Synthetic estimates are equivalent for larger municipalities (>250, 000 inh. ), but only for in POP 1 Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Concluding Remarks • Sampling strategy of next Italian Census PES is evaluated here through pseudo-population and simulated experiments • Slight improvement in census counts from registers is obtained from synthetic estimates • Though Census PES is required by EU regulation for evaluation purposes, our present results does not endorse the use of PES in order to correct Census counts • Even not discussed here, direct estimation with calibration achieved suitable results at domain level both in term of Bias and Variance Further developments • Better definition of pseudo-populations with respect to coverage ratios between EAs • Use of model estimation (EBLUP) is promising in our previous studies carried out in a simplified framework Q 2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010