List frames area frames and administrative data are

  • Slides: 13
Download presentation
List frames area frames and administrative data, are they complementary or in competition? Elisabetta

List frames area frames and administrative data, are they complementary or in competition? Elisabetta Carfagna University of Bologna Department of Statistics via Belle Arti 41 - 40126 Bologna carfagna@stat. unibo. it

Many different data on agriculture available in the various countries in the world. •

Many different data on agriculture available in the various countries in the world. • • • Administrative data are common almost everywhere In some countries, a specific data collection, based on list or area frames or both, is performed for producing agricultural statistics Rationalization is felt as a strong need • Various non-comparable data • Maintaining different data acquisition systems is very expensive Analysis of risks, advantages, disadvantages and requirements of the use of administrative data for statistical purposes Proposal of some methods to combine list frames, area frames and administrative data for producing accurate agricultural statistics

Administrative data • • • Definitions, coverage and quality depend on administrative requirements Acquisition

Administrative data • • • Definitions, coverage and quality depend on administrative requirements Acquisition regulated by law, have to be collected whatever their cost Very difficult to calculate costs Administrative data relevant for agricultural statistics: taxation, social insurance and subsidies Traditionally used for updating a list for sample surveys Increase of ability to handle large sets of data Capacity of some administrative departments to collect data through the web Budget constraints Suggest to use administrative data more extensively and even to produce statistics through direct tabulation

Administrative data versus sample surveys Register: complete list of objects belonging to a defined

Administrative data versus sample surveys Register: complete list of objects belonging to a defined objects set and with identification variables that allow to update the register itself • Huge amount of data collected • Sometimes purposive sample controlled to apply sanctions A statistical systems based on a register allows; • saving money • reducing response burden • producing figures for very detailed domains • estimating transition over time Sample survey: • Population identified, decision about: parameters and levels of accuracy, taking into account budget constraints • Much care devoted to data collection and quality control • Efficient sample designs for reducing sampling errors

Disadvantages of direct use of administrative data • • • data already collected information

Disadvantages of direct use of administrative data • • • data already collected information acquired is not exactly the one needed collected for purposes relevant for the respondent – coverage problems • often objects in the registers are partly statistical units of the population partly something else • study in Sweden: only 79% of farms have a one to one match with the IACS register (created for European agricultural subsidies) 6, 4% have a one to many or many to many match and 14. 6% of farms have no match – incompleteness of data inflates the risk of bias for some crops (in Sweden about 20%) non clear dynamics can be generated by controls comparability over time is influenced by change coverage level

Errors in administrative data • • • Direct tabulation suggested if sum of values

Errors in administrative data • • • Direct tabulation suggested if sum of values presented by all objects in register is an unbiased estimator of the total of a variable. Estimator applied to data affected by errors E. g. IACS declarations for a crop c are affected by: – – • • commission errors (some parcels declared as covered by crop c are covered by another crop or their surface is inflated) omission errors (some parcels covered by crop c are not included in IACS declarations or their surface is less than the true). If commission and omission errors compensate, sum of declarations for crop c unbiased estimator of total surface IACS Purposive sampling; for detecting irregularities, 2003, Italian level, durum wheat error 3. 5% of controlled surface Commission errors 7. 8% of the sum of declarations in Puglia and 8. 4% in Sicily. Omission error: 13. 9% of ITA Consorzio estimate in Puglia and 23. 3% in Sicily

Alternatives to direct tabulation One procedure for: • reducing the risk of bias due

Alternatives to direct tabulation One procedure for: • reducing the risk of bias due to under-coverage of registers • avoiding double data acquisition Is the following: • Sampling farms from a complete and updated list and performing record linkage with the register for capturing data corresponding to farms selected from the list • If the register is unreliable for some variables, related data have to be collected through interviews as well as data not found in the register due to record linkage difficulties Combined use of various registers • improves the coverage of the population and data quality • allows to describe the socio-economic situation of rural households • it doesn’t solve all problems due to under-coverage and incorrect declaration. Statistical methodological work to be done is very heavy

Calibration estimators Probabilistic sample survey whose efficiency is improved by the use of register

Calibration estimators Probabilistic sample survey whose efficiency is improved by the use of register data as auxiliary variable in calibration estimators Improved efficiency allows to reach the same precision reducing sample size, survey costs and response burden AGRIT 2000, IACS data as auxiliary variable in regression estimator CV reduced from 4. 8% to 1. 3% in Puglia and from 5. 9% to 3. 0% in Sicily. (Landsat TM data reduced CVs to 2. 7% and 5. 6%) Advantages: • register data included in the estimation procedure • reduction of sample size, survey costs and respondent burden • if frame complete and without duplications no under-overage • data are collected for pure statistical purposes Disadvantages: • costs and respondent burden higher than in direct tabulation • difficulty to produce reliable estimates for small domains

Combined use of different frames Various incomplete registers, information included in their records is

Combined use of different frames Various incomplete registers, information included in their records is not sufficiently reliable to be directly used for statistics, thus a sample survey has to be designed to collect information through interviews. Multiple frames approach • Treating these registers as multiple incomplete lists from which separate samples can be selected • Two-stage estimator combines estimates calculated on nonoverlapping sample units belonging to the different frames with estimates calculated on overlapping sample units Does not require record matching of listing units of different lists Some two-stage estimators need identification of identical units only in the overlap samples and some others have been developed for cases in which these units cannot be identified Completeness assumption has to be made: every unit in the population of interest should belong to at least one of the frames

Area frames When completeness is not guaranteed by combined use of different registers, an

Area frames When completeness is not guaranteed by combined use of different registers, an area frame should be adopted for avoiding bias, since an area frame is always complete and useful for a long time The completeness of area frames suggests their use in many cases: • other complete frame is not available • existing list of sampling units changes very rapidly • an existing frame is out of date • existing frame was obtained from a census with low coverage • a multiple purpose frame is needed for estimating many different variables (agricultural, environmental etc. ) Allow objective estimates of characteristics that can be observed on the ground, without interviews Materials used for survey and information collected help to reduce non sampling errors in interviews and are a good basis for data imputation for non-respondents Area sample survey materials becoming cheaper and more accurate

Combining a list and an area frame Disadvantages of area frames • cost of

Combining a list and an area frame Disadvantages of area frames • cost of implementing the survey program • necessity of many cartographic materials • sensitivity to outliers and instability of estimates • if survey conducted through interviews and respondents live far from selected area unit, their identification may be difficult and expensive, and missing data tend to be relevant Multiple frame sample survey design for avoiding instability of estimates and improving their precision • A list of very large operators and operators that produce rare items • If this list is short, it is generally easy to construct and update • Identification of the area sample units included in the list frame is needed for avoiding upwards bias of estimates • Sample units belonging to list and not to the area frame do not exist and the size of intersection domain has the size of the list • Approach convenient if the list contains units with large values and survey cost in the list is much lower than in area frame

Conclusions 1 • • Increase of ability to handle large sets of data Capacity

Conclusions 1 • • Increase of ability to handle large sets of data Capacity of some administrative departments to collect data through the web • Budget constrains • Suggest to use administrative data more extensively and even to produce statistics through direct tabulation. • reducing response burden • producing figures for very detailed domains • allowing estimation of transition over time However problems for producing statistics: • definitions, coverage, information acquired, aims of data collection and quality controls Combined use of registers improves coverage and data quality and allows describing socio-economic conditions • good identification variables sophisticated record linkage system are needed and a heavy statistical methodological work has to be done • effect of imperfect matching

Conclusions 2 • • Sampling farms from a complete and updated list and performing

Conclusions 2 • • Sampling farms from a complete and updated list and performing record linkage with the register for capturing data Probabilistic sample survey whose efficiency is improved by the use of register data as auxiliary variable in calibration estimators – – – • improved efficiency allows to reach the same precision reducing sample size, survey costs and response burden register data included in the estimation process reduction of sample size, , survey costs and respondent burden if frame complete and without duplications no under-overage data are collected for pure statistical purposes Multiple frame approach – – – does not require record matching of listing units of the different lists when completeness not guaranteed by the different registers, area frame allows avoiding bias multiple frame sample survey design allows to avoid instability of estimates based on an area frame and to improve their precision