Data validation practical use case National Accounts IPA
- Slides: 29
Data validation practical use case: National Accounts IPA Course: Data Validation in the ESS 18 -19 May 2017 Daniel SURANYI – Eurostat Directorate C 1 Eurostat
Presentation Outline • Background and milestones • ESA 2010 Validation Task Force • Structural and content validation service • Target implementation dates • Questions and discussion 2 Eurostat
Background and milestones 2012 agreed ESTAT-OECD-ECB Validation checks reflected in ESTAT-OECD Protocol 2014 Implementation of validation checks in internal systems for ESA 2010 2014 NAWG/DMES agree to set up ESA 2010 validation Task Force Recurrent validation problems progressively shift from coding to content related checks 3 Eurostat
ESA 2010 TF on data validation: Mandate • Review validation checks performed in Eurostat • Clarification of methodological or practical aspects • Validation rules for an internal or external prevalidation tool • Collection and dissemination of associated metadata Eurostat
TF Participants • Participants represented seven NSIs (AT, DE DK, GR, IE, SE, UK), three Central Banks (BE, FR, IT) and two main data users (ECB and OECD) • IT, SI and PL participated via correspondence • Participated in the workshop for ESTAT grants on validation (DK, GR, NL, SI) • ESTAT links with representatives to cover ESA 2010 TP Eurostat 5
Current validation process Tool: Pre-transmission checks Su r ve ys Sources NSIs & Central Banks STEP 1: SDMX Structural & basic data checks e. DAMIS • Format • Codes • Embargo + "F" flag • Hole in series in A dm STEP 2: Structural & basic data checks EU AGGREGATES & DISSEMINATION STEP 3: Statistical and Economic Plausibility Checks Eurostat • Revisions • A vs Q • Horizontal checks • Balances • CLVs for ref year • Zeros & -ve values
NAPS-S: Objective of the project Re-design of the National Accounts statistical Production System using Eurostat corporate (CSPA compliant) services for managing and validating incoming data files. 7 Eurostat
Official statistics: the challenge… Official reference Cross-domain usage More timely policies Commercial providers GDP T+30 Shrinking resources € �� 8 Eurostat
Stovepipe production: the reality… Process Analyse Disseminate Survey B Survey A Collect 9 Eurostat
Stovepipe production: the reality… • Customised for a specific domain • Conventions used within domains / surveys • Hampering cross-domain usage • Leading to low level of transparency • Not possible to share IT tools efficiently • Difficult to share data across domains / organisations • Difficult to measure quality 10 Eurostat
Example: National Accounts production in Eurostat Business process 11 Eurostat
Example: National Accounts production in Eurostat Implementation • FAME • Oracle RDBMS • Oracle OLAP 12 Eurostat
Target: flexible use of statistical services 13 Eurostat
Intermediate step: partial orchestration 14 Eurostat
Intermediate step: partial orchestration 15 Eurostat
Target: flexible use of statistical services • Customised for a specific domain • Conventions used within domains / surveys • Hampering cross-domain usage • Leading to low level of transparency • Not possible to share IT tools efficiently • Difficult to share data across domains / organisations • Difficult to measure quality Eurostat 16
Target: flexible use of statistical services • Customised for a specific domain • Conventions used within domains / surveys • Hampering cross-domain usage • Leading to low level of transparency • Not possible to share IT tools efficiently • Difficult to share data across domains / organisations • Difficult to measure quality Eurostat 17
Target: flexible use of statistical services • Architecture for cross-domain usage • Standards used across domains / surveys • Enabling cross-domain usage • Leading to transparency • Encouraged to share IT tools efficiently • Facilitates sharing data across domains / organisations • Possible to measure quality (process, data) Eurostat 18
The big picture: using standards GSIM GSBPM Reference information model Process step categories ☑ Statistical Production CSPA Service specification VTL Validation expressions Eurostat SDMX Data Modelling
Connections from data providers National Content Validation National Structural Validation 1) Connect to the repository National software VTL Repository SDMX Registry Eurostat Statistical Service A Statistical Service B Common Repository 20
Connections from data providers ESS Content Validation 2 a) Use ESS service replicated ESS Structural Validation Content Validation Structural Validation Statistical Service A Statistical Service B VTL Repository SDMX Registry Common Repository 21 Eurostat
Connections from data providers 2 a) Use ESS service shared Content Validation Structural Validation Statistical Service A Statistical Service B VTL Repository SDMX Registry Common Repository 22 Eurostat
Connections from data providers 3) Connect to the process Structural Validation Content Validation Statistical Service A Statistical Service B SDMX Registry VTL Repository Common Repository 23 Eurostat
SDMX compliance Basic logical checks • Valid SDMX-ML file • Coded according to the DSD • Mandatory fields present • Correct data types • Dataflow definition • Sender ID and REF_AREA • Table ID is present • Value "Na. N" and OBS_STATUS • EMBARGO_DATE and CONF_STATUS • PRICES and REF_YEAR_PRICE Structural Validation SDMX Registry General plausibility and consistency (within file) • Additivity of breakdowns • Outliers • Consistency between prices • Unadjusted and adjusted series Content Validation Basic content checks • Missing or unexpected series • Hole in series • Zero values • Negative values ? VTL Repository Advanced plausability and consistency (across files) Cross-domain or source checks • Revisions • Quarterly versus Annual • Same series across tables • Balance of Payments • Trade statistics • Labour market statistics • Data pulished by NSI or IO
Validation Roadmap: NAPS-S What Structural Content Development of service Q 3 -4/2015 Q 1/2017 Pilot with countries Q 1/2016 03 -04/2017 Corrections & deployment Q 2/2016 Q 2 -3/2017 Go-live Q 4/2017 Q 3 -4/2016 Comments Based on modified version of EDIT Looping in of VTL language
Phase 1 Setup All countries EDAMIS Regular production Selected TF members Eurostat Process Manager Step 1: Call Structual Validation Svc Structural Validation Svc Po. C If OK: deliver to "reduced" production system If Not OK: deliver report to EDAMIS feedback channel Eurostat
Conval Workflow? All countries EDAMIS Regular production Selected TF members Eurostat Process Manager Structural Validation Content Validation ? ? ? Warnings ? ? ? Step 2: Svc Po. C Call Content Validation Svc o Supporting metadata / footnotes inside the SDMX message If OK: to "reduced" production system o Advanced validation, e. g. deliver visualisation o Judgement call: error or warning If Not OK: deliver report to EDAMIS feedback channel Eurostat
Demonstration: Scenario 3 • Use EDAMIS to transmit data • Data provider perspective • Use Eurostat process manager • Eurostat workflow defined • IS 4 STAT Input Hall: Eurostat process monitor Eurostat
More information • National Accounts Conval Webinar o o Scope of the content validation service High level setup Full validation workflow Sample report and implementation timeline https: //circabc. europa. eu/d/a/workspace/Spaces. Store/605 d 7 bcd-59754835 -ab 3 d-7 e 54 a 54215 f 4/ESA%20 VALTF%20 Conval%20 Webinar_0. mp 4 Daniel SURANYI Department: ESTAT. C Email: daniel. suranyi@ec. europa. eu 29 Eurostat
- Best worst and average case
- Why we use validation set
- Pharmaceutical water systems training
- Nat 5 practical cookery
- National 5 practical cookery
- Gis data validation
- Gis data validation
- Length check ict
- Data validation software
- Skypak hvac replacement
- National income and product accounts
- Systems of national accounts
- System of national accounts (sna)
- Advisory expert group on national accounts
- Practical data science with r github
- National unification and the national state
- National center for case study teaching in science
- National center for case study teaching in science
- National center for case study teaching in science
- National center for case study teaching in science
- The negro family the case for national action
- National center for case study teaching in science
- National center for case study teaching in science
- National center for case study teaching in science
- Short case vs long case
- Binary search average case
- Glennan building cwru
- Bubble sort algorithm pseudocode
- Hershey's erp failure
- Bubble sort best case and worst case