IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORTTERM ENTERPRISE STATISTICS

  • Slides: 13
Download presentation
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint

IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia, Statistics Finland, ISTAT, Statistics Lithuania, ONS

Outline of the presentation • Scope of the project - use of admin data

Outline of the presentation • Scope of the project - use of admin data for STS • Two situations: a. VAT fairly complete and representative - VAT representative b. VAT not complete and not-representative - VAT not representative • VAT representative a. imputing missing values • Imputing missing values a. methods for imputations b. which units to impute • Conclusions and implications for other projects Imputing missing admin data for STS-estimates 1

Scope of the project Final situation: (after year) - all admin data are available

Scope of the project Final situation: (after year) - all admin data are available for NSIs - data cover the population Monthly and quarterly estimates: Part of admin data are ‘missing’ L. E. (survey) admin data Missing Assumption If admin data are complete, possible to use for statistics Challenge How to estimate for ‘ missing’ admin data in case of monthly and quarterly estimates Scope: turnover (VAT-registration), wages+employees (“social security data”) Imputing missing admin data for STS-estimates 2

Additional Value of ESSnet Admin. Data • VAT = Value Added Tax • The

Additional Value of ESSnet Admin. Data • VAT = Value Added Tax • The European Union value added tax (EU VAT) is a value added tax encompassing member states in the European Union VAT area. Joining in this is compulsory for member states of the European Union. • Each Member State's national VAT legislation must comply with the provisions of EU VAT law as set out in Directive 2006/112/EC. TRANSLATION TO STATISTCS • INPUT: Available VAT-information quite similar in Europe ! • OUTPUT: obligations also similar in Europe (STS, SBS. ESR regulations) • CONCLUSIONS ESSNET: methodological challenges in use of admin data indentical -> solution may differ, but only limited Imputing missing admin data for STS-estimates 3

Two situations Situation A: Situation B: L. E. (100 % sample) L. E (100

Two situations Situation A: Situation B: L. E. (100 % sample) L. E (100 % sample) VAT Almost complete Not available or very limited GENERAL SITUATION FOR Q; t+45 days GENERAL SITUATION FOR M; t+30 days SITUATION A. or B. FOR OTHER ESTIMATES (Q-flash; M-T+45/50 d) DIFFERS PER COUNTRY Imputing missing admin data for STS-estimates 4

Methods Situation A: methodology Final situation 100 % sample QUALITY STSESTIMATES: Revision compared to

Methods Situation A: methodology Final situation 100 % sample QUALITY STSESTIMATES: Revision compared to final estimate Admindata average bias: STS 100 % sample Admindata average error: Missing SITUATION A: Admindata coverage almost complete ESTIMATION ONLY BASED ON ADMIN DATA sample L. E. SITUATION B: Admindata coverage incomplete ADMIN DATA = AUXILIARY INFORMATION sample VAT SME ESTIMATI ON VAT T-x established techniques experimental meth. • Level estimates • Imputation of missing data (with available VAT) NOT DISCUSSED FURTHER

Methods for imputations • Analysed several production systems: i. e. DE, F, “Nordic countries’,

Methods for imputations • Analysed several production systems: i. e. DE, F, “Nordic countries’, NL , I • Imputation of “missing VAT” based on: Ot/Ot-1, Ot/Ot-12 of available VAT – or similar approaches • Stratification levels for calculation stratum imputations differ from NACE 2 -digit x 2 -size classes to NACE 4 -digit x 9 size classes KEY QUESTION: Do these different approaches lead to different output, because methods are generally applied when coverage of L. E. survey + available VAT exceeds 90 % of target variable ? Imputing missing admin data for STS-estimates 6

Methods for imputations – testing of different methodologies (example Estonia) Conclusion: Imputation method provide

Methods for imputations – testing of different methodologies (example Estonia) Conclusion: Imputation method provide similar results if the population is fixed and VAT covers > 80 % of population Imputing missing admin data for STS-estimates 7

Comparing imputations with realisations (approach Statistics Finland) • Five imputation rules for current period

Comparing imputations with realisations (approach Statistics Finland) • Five imputation rules for current period at mico-level • Mean Imputation rules automatically evaluated and compared by annual change calculating maximum proportional forecast errors using data concerning the five latest months. The selection rules are: Geometric mean of monthly changes • An imputation rule < 20% maximum proportional forecast error and the same direction of change as in the last two months is Previous turnover automatically admissible; Mean turnover • The model with the smallest maximum error is considered best Turnover of comparison month Main difference with other detected practices: • No assumption; available VAT = representative • Not all missing data imputed (in practice 20 - 50 %) Imputing missing admin data for STS-estimates 8

Comparing imputations with realisations (more precise conclusions) Explanations: - Outlier effect on calculated Ot/Ot-1

Comparing imputations with realisations (more precise conclusions) Explanations: - Outlier effect on calculated Ot/Ot-1 or Ot/Ot-12 values - Late VAT-reporters are likely a selective group in countries with automatic fining systems in case of late VAT-reporting. impact of selectivity on output is generally neglible due to high coverage available data Imputing missing admin data for STS-estimates 9

Which units to impute Imputing missing admin data for STS-estimates 10

Which units to impute Imputing missing admin data for STS-estimates 10

Impact on results example Italy imputation technique uncert. provisional population Conclusion: effect on revision

Impact on results example Italy imputation technique uncert. provisional population Conclusion: effect on revision caused by uncertainty of units to be imputed is larger than imputation technique itself Imputing missing admin data for STS-estimates 11

Conclusions • When using Admin Data for STS missing data are imputed • Most

Conclusions • When using Admin Data for STS missing data are imputed • Most widely used imputation rules are: Ot/Ot-1 or Ot/Ot-12 • Taking into account large coverage of available data exact chosen imputation technique has only limited impact on outcome, despite the indication that the main assumption of the used techniques “available VAT = representative” might not be 100 % correct. • More important than the imputation technique = estimate for provisional population Imputing missing admin data for STS-estimates 12