An Overview of Editing and Imputation Methods for

  • Slides: 14
Download presentation
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi,

An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting on Population and Housing Censuses Geneva, 13 -15 May, 2008

Outline q Features of 2001 E&I strategy q E&I strategy for 2011 Census q

Outline q Features of 2001 E&I strategy q E&I strategy for 2011 Census q Likely innovations for 2011 Census q Impact on editing and validation procedures q Conclusions

Features of 2001 E&I strategy q Main E&I purpose: provide a complete and consistent

Features of 2001 E&I strategy q Main E&I purpose: provide a complete and consistent set of data by performing plausible imputations and preserving the maximum amount of collected information q E&I strategy: divide the E&I problem into simpler subproblems and find appropriate solutions for each of them q Overall E&I process composed of several (connected) procedures addressing to specific problems and implementing suitable methods q Development and use of new techniques and software tools

E&I strategy for 2011 Census q Built on the useful experience of the 2001

E&I strategy for 2011 Census q Built on the useful experience of the 2001 Census, taking account of: § § The innovations in the survey design Eurostat timeliness constraints In particular: • Census variables split into topics processed in predetermined order (first demographic, then socio-economic) by appropriate procedures • Adaptation of 2001 procedures to the innovations and developing of new procedures by means of highly efficient algorithms • Proper planning, implementation and managing of the E&I procedures

Main elements of the 2011 strategy q Use of DIESIS* system developed in 2001

Main elements of the 2011 strategy q Use of DIESIS* system developed in 2001 by ISTAT and academic researchers (Department of Computer and Systems Science of the University of Roma “La Sapienza”). Based on optimization techniques, allows: § § § • Treatment of qualitative and quantitative variables Between-unit and within-unit edit rules Joint use of data driven and minimum change approaches DIESIS will process 2011 demographic variables and, likely, some socio-economic variables * Data Imputation and Edit System - Italian Software

Main elements of the 2011 strategy q Joint use of data driven and minimum

Main elements of the 2011 strategy q Joint use of data driven and minimum change approaches by DIESIS system § When reduced pool of donors the data driven approach can require imputing too many values § Minimum change approach used to minimize the number of values to be changed

Main elements of the 2011 strategy q Identification of the respondent path § Respondent

Main elements of the 2011 strategy q Identification of the respondent path § Respondent paths used to: – § § – Compute the Subset of Admissible Values (SAV) of Year of birth, a strata variable for the imputation of demographic variables – connection between demographic and socioeconomic steps Define strata for the imputation of socio-economic variables Missing responses or errors can make uncertain the identification of the right respondent path Automatic procedure for the identification of the most likely path based on the analysis of the responses given to filter and dependent questions

Main elements of the 2011 strategy q Validation of Person 1 in the household

Main elements of the 2011 strategy q Validation of Person 1 in the household § § Based on optimization techniques implemented in the DIESIS system The minimum change algorithm assigns the role of Person 1 to the person that minimizes the number of changes needed for the record to be consistent q Identification of potential couples § § Components of couples having non-unique relationship to Person 1 identified prior to editing Score based on the responses provided to the demographic variables

Main elements of the 2011 strategy q Especial care in E&I of small but

Main elements of the 2011 strategy q Especial care in E&I of small but important groups in the population E. g. Centenarians validation § 2001 procedure: – – – § Automatic match of individuals enumerated in the 2001 with same individuals enumerated in the 1991 Automatic check for internal consistency of unlinked records Manual check for consistency with questionnaire images of some ambiguous cases New procedure supported by availability of local population registers

Likely innovations for 2011 q Short-long form questionnaires § § Short: (mainly) demographic variables

Likely innovations for 2011 q Short-long form questionnaires § § Short: (mainly) demographic variables Long: demographic and socio-economic variables q Availability of registers § § § Local population registers (residing individuals) Integrative registers from auxiliary sources Residential address lists q Use of multi-mode data collection § Enumerators, CATI, mail, web

Impact on E&I and validation q Socio-economic characteristics collected on sample basis (by long-form)

Impact on E&I and validation q Socio-economic characteristics collected on sample basis (by long-form) § Two procedures for computing the SAV of Year of birth (one for short-form, one for long-form) § The reduced pool of donors for imputation of longform variables requires careful managing of data collection and donor pool selection phases § Sampling weights required for data validation after E&I of long-form variables

Impact on E&I and validation q Availability of registers : § § Improvement of

Impact on E&I and validation q Availability of registers : § § Improvement of the quantitative control of the forms Imputation of missing or inconsistent census values by matching census data and register data (Record linkage procedure) – – – § availability of unique record identifiers same time reference than census data good quality of register data Imputation of missing or inconsistent census values by adding register data to census data - enlarging the donor pool

Impact on E&I and validation q Use of multi-mode data collection § Improvement of

Impact on E&I and validation q Use of multi-mode data collection § Improvement of the collected data quality due to editing performed at the data capturing (CATI, web) § Procedure aiming at verifying duplicate questionnaires is required

Conclusions q E&I strategy for 2011 Census based on 2001 experiences q The new

Conclusions q E&I strategy for 2011 Census based on 2001 experiences q The new survey design aims to reduce the respondent burden but requires a careful monitoring during production and a more complex E&I process q High efficient procedures need to be developed in order to meet the timeliness requirement E&I is an achievable but hard task