World Bank Documenting the Household Survey Sampling process
World Bank Documenting the Household Survey Sampling process
Outline § Documenting sampling elements of a survey § Metadata Editor § Microdata library
Elements of a household sample survey § What is it you want to measureunderstand? § Policy effectiveness § Policy design § Budgeting and planning § Design Questionnaire § Science + method to this § § Preparation of interviewer manualstraining of field staff Design sample (probabilistic, size) List and draw sample (n size) Interview § Telephone, web, face-to-face, paper, tablet (CAPI) § Collect, validate, clean, anonymize, document, disseminate, analyze
Why Document? http: //dilbert. com/strip/2010 -05 -28
Why? Documentation, or metadata, helps the researcher to: § Find the data they are interested in. § Understand what the data are measuring and how the data have been created. § Assess the quality of the data. § To increase the credibility of the data. Users appreciate transparency in data collection and processing methods § Rich metadata reduces the burden on the data producer, as it reduces the need to provide regular support to users of the data.
Documentation Provide detailed metadata § For making data usable § Users need to fully understand the data: why, by whom, when, and how they were collected and processed § For making data discoverable in catalogs § How will users know about the availability of your data? By providing searchable data catalogs. International metadata standards (in particular the DDI) and specialized software available to help document and catalog microdata.
Documentation International Standard: DDI § DDI is an XML metadata standard § Standard checklist of what you need to know about a survey and its dataset(s) § Documents the full survey life-cycle § Developed by academic data centers § Now used in most countries in the world
Documentation International Standard: DDI § Quick Reference Guide for Data Archivists (Especially from Page 11 onwards)
Documentation Sampling documentation In the Metadata Editor, there is 1 section (which contains 4 sub-sections itself) about sampling:
Documentation: Sampling Checklist 1. Sampling design report with • Allocation of the sample into strata • Excluded strata, if any • Estimation formulas (selection probabilities and weights) 2. Household listings forms 3. Sample frames • For the first sampling stage/s: list of all sampling units (typically in Excel) • For the last sampling stage: list of all households in each sample point 4. Non-response rates, by sample point 5. On the survey datasets • Sampling weights • Identification of the sample points 6. In the survey reports • Standard errors, confidence intervals and design effects for key variables
Documentation Steps Organizing your files Gathering and preparing the data set Gathering and preparing the documentation Importing data and establishing relationships Importing external resources Adding metadata Running diagnostics Generating the standard survey documentation using the PDF generator • Quality assessment • Producing the output for publication • •
Documentation Sampling This item should document the design and definition of the sample size, including: sampling frame, sampling type and final size of the sample, as well as sample loss, estimation method and accurate calculation of the results. This Item is only applicable for sample surveys. This element provides information on the sampling frame and the methods and procedures used to select respondents. The desired sample size should also be mentioned.
Documentation Sampling Procedure This field only applies to sample surveys. Information on sampling procedure is crucial This section should include summary information that includes though is not limited to: - Sample size - Selection process (e. g. , probability proportional to size or over sampling) - Stratification (implicit and explicit) - Level of representation - Strategy for absent respondents/not found/refusals (replacement or not) - Stages of sample selection - Design omissions in the sample - Sample frame used, and listing exercise conducted to update it It is useful also to indicate here what variables in the data files identify the various levels of stratification and the primary sample unit. These are crucial to the data users who want to properly account for the sampling design in their analyses and calculations of sampling errors.
Documentation Sampling Sometimes the reality of the field requires a deviation from the sampling design (for example due to difficulty to access to zones due to weather problems, political instability, etc). If for any reason, the sample design has deviated, this should be reported here. Major deviations from the sample design: this element is used to describe the correspondence between the units that were successfully surveyed and the planned sample. Any significant deviation should be mentioned here.
Documentation Sampling Response rate provides that percentage of households (or other sample unit) that participated in the survey based on the original sample size. Omissions may occur due to refusal to participate, impossibility to locate the respondent, or other.
Documentation Sampling Weighting Provide here the list of variables used as weighting coefficient. If more than one variable is a weighting variable, describe how these variables differ from each other and what the purpose of each one of them is. Example: Sample weights were calculated for each of the data files. Sample weights for the household data were computed as the inverse of the probability of selection of the household, computed at the sampling domain level (urban/rural within each region). The household weights were adjusted for nonresponse at the domain level, and were then normalized by a constant factor so that the total weighted number of households equals the total unweighted number of households.
Documentation The Metadata Editor and Pacific Data Library Nesstar Publisher – Live Demo Pacific Data Library – pdl. spc. int
- Slides: 17