Data validation practical use case National Accounts IPA

  • Slides: 29
Download presentation
Data validation practical use case: National Accounts IPA Course: Data Validation in the ESS

Data validation practical use case: National Accounts IPA Course: Data Validation in the ESS 18 -19 May 2017 Daniel SURANYI – Eurostat Directorate C 1 Eurostat

Presentation Outline • Background and milestones • ESA 2010 Validation Task Force • Structural

Presentation Outline • Background and milestones • ESA 2010 Validation Task Force • Structural and content validation service • Target implementation dates • Questions and discussion 2 Eurostat

Background and milestones 2012 agreed ESTAT-OECD-ECB Validation checks reflected in ESTAT-OECD Protocol 2014 Implementation

Background and milestones 2012 agreed ESTAT-OECD-ECB Validation checks reflected in ESTAT-OECD Protocol 2014 Implementation of validation checks in internal systems for ESA 2010 2014 NAWG/DMES agree to set up ESA 2010 validation Task Force Recurrent validation problems progressively shift from coding to content related checks 3 Eurostat

ESA 2010 TF on data validation: Mandate • Review validation checks performed in Eurostat

ESA 2010 TF on data validation: Mandate • Review validation checks performed in Eurostat • Clarification of methodological or practical aspects • Validation rules for an internal or external prevalidation tool • Collection and dissemination of associated metadata Eurostat

TF Participants • Participants represented seven NSIs (AT, DE DK, GR, IE, SE, UK),

TF Participants • Participants represented seven NSIs (AT, DE DK, GR, IE, SE, UK), three Central Banks (BE, FR, IT) and two main data users (ECB and OECD) • IT, SI and PL participated via correspondence • Participated in the workshop for ESTAT grants on validation (DK, GR, NL, SI) • ESTAT links with representatives to cover ESA 2010 TP Eurostat 5

Current validation process Tool: Pre-transmission checks Su r ve ys Sources NSIs & Central

Current validation process Tool: Pre-transmission checks Su r ve ys Sources NSIs & Central Banks STEP 1: SDMX Structural & basic data checks e. DAMIS • Format • Codes • Embargo + "F" flag • Hole in series in A dm STEP 2: Structural & basic data checks EU AGGREGATES & DISSEMINATION STEP 3: Statistical and Economic Plausibility Checks Eurostat • Revisions • A vs Q • Horizontal checks • Balances • CLVs for ref year • Zeros & -ve values

NAPS-S: Objective of the project Re-design of the National Accounts statistical Production System using

NAPS-S: Objective of the project Re-design of the National Accounts statistical Production System using Eurostat corporate (CSPA compliant) services for managing and validating incoming data files. 7 Eurostat

Official statistics: the challenge… Official reference Cross-domain usage More timely policies Commercial providers GDP

Official statistics: the challenge… Official reference Cross-domain usage More timely policies Commercial providers GDP T+30 Shrinking resources € �� 8 Eurostat

Stovepipe production: the reality… Process Analyse Disseminate Survey B Survey A Collect 9 Eurostat

Stovepipe production: the reality… Process Analyse Disseminate Survey B Survey A Collect 9 Eurostat

Stovepipe production: the reality… • Customised for a specific domain • Conventions used within

Stovepipe production: the reality… • Customised for a specific domain • Conventions used within domains / surveys • Hampering cross-domain usage • Leading to low level of transparency • Not possible to share IT tools efficiently • Difficult to share data across domains / organisations • Difficult to measure quality 10 Eurostat

Example: National Accounts production in Eurostat Business process 11 Eurostat

Example: National Accounts production in Eurostat Business process 11 Eurostat

Example: National Accounts production in Eurostat Implementation • FAME • Oracle RDBMS • Oracle

Example: National Accounts production in Eurostat Implementation • FAME • Oracle RDBMS • Oracle OLAP 12 Eurostat

Target: flexible use of statistical services 13 Eurostat

Target: flexible use of statistical services 13 Eurostat

Intermediate step: partial orchestration 14 Eurostat

Intermediate step: partial orchestration 14 Eurostat

Intermediate step: partial orchestration 15 Eurostat

Intermediate step: partial orchestration 15 Eurostat

Target: flexible use of statistical services • Customised for a specific domain • Conventions

Target: flexible use of statistical services • Customised for a specific domain • Conventions used within domains / surveys • Hampering cross-domain usage • Leading to low level of transparency • Not possible to share IT tools efficiently • Difficult to share data across domains / organisations • Difficult to measure quality Eurostat 16

Target: flexible use of statistical services • Customised for a specific domain • Conventions

Target: flexible use of statistical services • Customised for a specific domain • Conventions used within domains / surveys • Hampering cross-domain usage • Leading to low level of transparency • Not possible to share IT tools efficiently • Difficult to share data across domains / organisations • Difficult to measure quality Eurostat 17

Target: flexible use of statistical services • Architecture for cross-domain usage • Standards used

Target: flexible use of statistical services • Architecture for cross-domain usage • Standards used across domains / surveys • Enabling cross-domain usage • Leading to transparency • Encouraged to share IT tools efficiently • Facilitates sharing data across domains / organisations • Possible to measure quality (process, data) Eurostat 18

The big picture: using standards GSIM GSBPM Reference information model Process step categories ☑

The big picture: using standards GSIM GSBPM Reference information model Process step categories ☑ Statistical Production CSPA Service specification VTL Validation expressions Eurostat SDMX Data Modelling

Connections from data providers National Content Validation National Structural Validation 1) Connect to the

Connections from data providers National Content Validation National Structural Validation 1) Connect to the repository National software VTL Repository SDMX Registry Eurostat Statistical Service A Statistical Service B Common Repository 20

Connections from data providers ESS Content Validation 2 a) Use ESS service replicated ESS

Connections from data providers ESS Content Validation 2 a) Use ESS service replicated ESS Structural Validation Content Validation Structural Validation Statistical Service A Statistical Service B VTL Repository SDMX Registry Common Repository 21 Eurostat

Connections from data providers 2 a) Use ESS service shared Content Validation Structural Validation

Connections from data providers 2 a) Use ESS service shared Content Validation Structural Validation Statistical Service A Statistical Service B VTL Repository SDMX Registry Common Repository 22 Eurostat

Connections from data providers 3) Connect to the process Structural Validation Content Validation Statistical

Connections from data providers 3) Connect to the process Structural Validation Content Validation Statistical Service A Statistical Service B SDMX Registry VTL Repository Common Repository 23 Eurostat

SDMX compliance Basic logical checks • Valid SDMX-ML file • Coded according to the

SDMX compliance Basic logical checks • Valid SDMX-ML file • Coded according to the DSD • Mandatory fields present • Correct data types • Dataflow definition • Sender ID and REF_AREA • Table ID is present • Value "Na. N" and OBS_STATUS • EMBARGO_DATE and CONF_STATUS • PRICES and REF_YEAR_PRICE Structural Validation SDMX Registry General plausibility and consistency (within file) • Additivity of breakdowns • Outliers • Consistency between prices • Unadjusted and adjusted series Content Validation Basic content checks • Missing or unexpected series • Hole in series • Zero values • Negative values ? VTL Repository Advanced plausability and consistency (across files) Cross-domain or source checks • Revisions • Quarterly versus Annual • Same series across tables • Balance of Payments • Trade statistics • Labour market statistics • Data pulished by NSI or IO

Validation Roadmap: NAPS-S What Structural Content Development of service Q 3 -4/2015 Q 1/2017

Validation Roadmap: NAPS-S What Structural Content Development of service Q 3 -4/2015 Q 1/2017 Pilot with countries Q 1/2016 03 -04/2017 Corrections & deployment Q 2/2016 Q 2 -3/2017 Go-live Q 4/2017 Q 3 -4/2016 Comments Based on modified version of EDIT Looping in of VTL language

Phase 1 Setup All countries EDAMIS Regular production Selected TF members Eurostat Process Manager

Phase 1 Setup All countries EDAMIS Regular production Selected TF members Eurostat Process Manager Step 1: Call Structual Validation Svc Structural Validation Svc Po. C If OK: deliver to "reduced" production system If Not OK: deliver report to EDAMIS feedback channel Eurostat

Conval Workflow? All countries EDAMIS Regular production Selected TF members Eurostat Process Manager Structural

Conval Workflow? All countries EDAMIS Regular production Selected TF members Eurostat Process Manager Structural Validation Content Validation ? ? ? Warnings ? ? ? Step 2: Svc Po. C Call Content Validation Svc o Supporting metadata / footnotes inside the SDMX message If OK: to "reduced" production system o Advanced validation, e. g. deliver visualisation o Judgement call: error or warning If Not OK: deliver report to EDAMIS feedback channel Eurostat

Demonstration: Scenario 3 • Use EDAMIS to transmit data • Data provider perspective •

Demonstration: Scenario 3 • Use EDAMIS to transmit data • Data provider perspective • Use Eurostat process manager • Eurostat workflow defined • IS 4 STAT Input Hall: Eurostat process monitor Eurostat

More information • National Accounts Conval Webinar o o Scope of the content validation

More information • National Accounts Conval Webinar o o Scope of the content validation service High level setup Full validation workflow Sample report and implementation timeline https: //circabc. europa. eu/d/a/workspace/Spaces. Store/605 d 7 bcd-59754835 -ab 3 d-7 e 54 a 54215 f 4/ESA%20 VALTF%20 Conval%20 Webinar_0. mp 4 Daniel SURANYI Department: ESTAT. C Email: daniel. suranyi@ec. europa. eu 29 Eurostat