EDIT data validation system Ewa Stacewicz EUROSTAT VALIDATION

  • Slides: 22
Download presentation
EDIT data validation system Ewa Stacewicz EUROSTAT VALIDATION TEAM 18/05/2017 1 Eurostat

EDIT data validation system Ewa Stacewicz EUROSTAT VALIDATION TEAM 18/05/2017 1 Eurostat

Content of the presentation A. Introduction B. Objects in EDIT, main principles of EDIT

Content of the presentation A. Introduction B. Objects in EDIT, main principles of EDIT C. Ways of using EDIT D. Using EDIT - integration with other tools E. Hands-on training 2 Eurostat

EDIT Introduction • Data validation and editing tool • Allowing users to: • import

EDIT Introduction • Data validation and editing tool • Allowing users to: • import data • run validation programs • browse or export validation results • Friendly web-based User Interface • Available as a standalone application – can be installed locally on the user's PC and serve for the validation of confidential data 3 Eurostat 3

EDIT as a part of the ESS • EDIT is the statistical data validation

EDIT as a part of the ESS • EDIT is the statistical data validation software used for the validation services provided by EUROSTAT • EDIT was selected as a base of the validation service for the implementation of several projects under the ESS. VIP Programme Some statistics: • EDIT is used at Eurostat in 27 statistical domains • EDIT is used by the Member States and other institutions in 13 statistical domains 4 Eurostat

User role 1. Dataset upload according to a Format data file EDIT dataset VALIDATION

User role 1. Dataset upload according to a Format data file EDIT dataset VALIDATION FLOW 2. Create validation Job (1 -click) EDIT dataset + validation Program validation Job 3. Browse Error Report or export it Eurostat

Programmer role • Programmer role - manages the metadata needed by the user to

Programmer role • Programmer role - manages the metadata needed by the user to execute programs • Implements Formats (DSD) • Develops Programs for Datasets validation containing validation rules and dataset operations, prepares Lookups for code lists check • Sets up the unattended mode configuration Eurostat 6

Programmer role Program development • Custom EDIT Scripting Language designed for data validation •

Programmer role Program development • Custom EDIT Scripting Language designed for data validation • Programs contain validation rules / computations: Cell oriented – only one cell involved, e. g. OBS_VALUE > 0 Horizontal – a whole record involved, e. g. GEO=’BE’ -> OBS_VALUE > 0 Vertical – multiple records involved, e. g. in GEO dimension OBS_VALUE[BE 21] = OBS_VALUE[BE 211] + OBS_VALUE[BE 212] Hierarchical – multiple data sets involved, e. g. comparing MICRO data with AGGRE data of another dataset) • Possible dataset operations: Copy, Merge, Alter, Aggregate, … • Handling external files containing data and metadata. Eurostat 7

Validation rule layout RECORD CODECHECK_INDIC_IN { CONDITION in. Lookup(INDIC_IN, INDIC_IN_CL, "CODE"); ELSE { CODELIST_CODES

Validation rule layout RECORD CODECHECK_INDIC_IN { CONDITION in. Lookup(INDIC_IN, INDIC_IN_CL, "CODE"); ELSE { CODELIST_CODES : = CODELIST_CODES + 1; } ERRMSG "INDIC_IN is of an invalid value" SEVERITY "Error" (INDIC_IN); } • Rule name; • Rule type (RECORD, VERTICAL, NORECORD); • Rule body: • Condition = logical expression referencing fields in the data set; • 'Then' or 'Else' compute part - optional; • Error part (error message, severity, selected data to be displayed). Eurostat 8

Administrator role • Administrator role - manages domains, users and permissions • Domain -

Administrator role • Administrator role - manages domains, users and permissions • Domain - self-contained grouping of EDIT elements available to a group of users; • Contains Formats, Datasets, Programs • No object in two different domains can interact with each other • Users have access to all datasets and Jobs results within a Domain 9 Eurostat

Accepted data formats Gesmes / BOP ITS, BOP FDI UNA: +. ? ' UNB+UNOC:

Accepted data formats Gesmes / BOP ITS, BOP FDI UNA: +. ? ' UNB+UNOC: 3+FR 2+4 D 0+100929: 1637+IREF 000243++GESMES/TS' UNH+MREF 000001+GESMES: 2: 1: E 6' BGM+74' CSV (with or without header) /SBS, CVTS, TOURISM NAD+Z 02+ECB' 9 H; 2008; LT; 2; B-N_X_K 642; 11930; 16236; ; ; UNIT; ; ; TT 0; ; ; D 08 NAD+MR+4 D 0' 9 H; 2008; LT; 3; B-N_X_K 642; 11930; 1001; ; ; UNIT; ; ; TT; ; ; D 08 NAD+MS+FR 2' IDE+10+EUROSTAT_BOP_01 reporting' 9 H; 2008; LT; 4; B-N_X_K 642; 11930; 529; ; ; UNIT; ; ; TT; ; ; D 08 DSI+BOP_FDI_A' 9 H; 2008; LT; 30; B-N_X_K 642; 11930; 17766; ; ; UNIT; ; ; TT; ; ; D 08 STS+3+7' 9 H; 2008; LT; 2; B-E; 11930; 1138; ; ; UNIT; ; ; TT; ; ; D 08 DTM+242: 201009291637: 203' 9 H; 2008; LT; 3; B-E; 11930; 104; ; ; UNIT; ; ; TT; ; ; D 08 DTM+Z 02: 20072009: 702' 9 H; 2008; LT; 4; B-E; 11930; 61; ; ; UNIT; ; ; TT; ; ; D 08 IDE+5+EUROSTAT_BOP_01' GIS+AR 3' GIS+1: : : -' ARR++A: FR: N: 2: 330: N: 4 A: E: 9999: 20072009: 702: 0: A: F+0: A: F‘ <= multi-year 2007, 2008, 2009 observations ARR++A: FR: N: 2: 330: N: 4 F: E: 9999: 20072009: 702: 0: A: F+0: A: F' ARR++A: FR: N: 2: 330: N: 7 Z: E: 9999: 20072009: 702: 0: A: F+0: A: F' ARR++A: FR: N: 2: 330: N: A 1: E: 1100: 9999: 20072009: 702: 5824: A: F+5930: A: F+4204: A: F' ARR++A: FR: N: 2: 330: N: A 1: E: 1495: 9999: 20072009: 702: 5828: A: F+5932: A: F+4206: A: F' FLR example 1 001 E 20100121814 00 001 E 20100121816 93 001 E 20100121814 99 001 E 20100125290334 804. 822 5295. 549 6166. 24 581. 371 FLR example 2 2010010011 010252000405595911005909580 E 010252000405595911004009600 E 01 ZZZZZ Eurostat 2691. 966 237. 543 2734482. 0 341202. 0 0. 0 10

Ways of using EDIT • As a web service – called by other applications

Ways of using EDIT • As a web service – called by other applications • Standalone – running on a PC • Client – server – running in a Data Centre 11 Eurostat

EDIT variants for external users EDIT Public Web server EDIT standalone version for local

EDIT variants for external users EDIT Public Web server EDIT standalone version for local installation user from MS user from ESTAT • Local authentication • All components running locally: Tomcat server & Postgre. SQL DB • Confidentiality is preserved • ECAS authentication • Non-confidential data • In preparation: remote EDIT validation service for confidential data 12 Eurostat

EDIT variants for ESTAT users EDIT ESTAT server in the secure environment EDIT ESTAT

EDIT variants for ESTAT users EDIT ESTAT server in the secure environment EDIT ESTAT server in the standard environment user from ESTAT • Confidential data • Non-confidential data • Unattended mode • EDAMIS back channel • Integration with other systems in Eurostat 13

EDAMIS Integration • EDAMIS can send data to EDIT by placing the files in

EDAMIS Integration • EDAMIS can send data to EDIT by placing the files in a configurable location; • EDIT detects metadata based on the EDAMIS naming convention; • EDIT performs the processing in unattended mode; • EDIT acts as a client for the EDAMIS Feedback Channel Web Service in order to publish the results of a job execution. 14 Eurostat

EDIT working in unattended mode ERROR REPORT 15 Eurostat

EDIT working in unattended mode ERROR REPORT 15 Eurostat

EDAMIS notification 16 Eurostat

EDAMIS notification 16 Eurostat

Validation report • It contains: • Job results – information about the job plus

Validation report • It contains: • Job results – information about the job plus an overview of the validation results; • Error statistics – summary of the errors; • Error report – detailed list of errors; • Acceptance/rejection algorithm implemented in the program. 17 Eurostat

Errors statistics 18 Eurostat

Errors statistics 18 Eurostat

Detailed error report 19 Eurostat

Detailed error report 19 Eurostat

Detailed statistics report 20 Eurostat

Detailed statistics report 20 Eurostat

More info on EDIT • Take part in one of the next EDIT Webinars:

More info on EDIT • Take part in one of the next EDIT Webinars: • 14 June 2017 • 08 September 2017 - registration via ESTAT-VALIDATION@ec. europa. eu • For next webinars please check the section Editing & Validation Events on: https: //webgate. ec. europa. eu/fpfis/mwikis/ESSVal id. Serv 21 • Specific COD user manual will be available soon Eurostat

Thank you for your attention! Any questions? 22 Eurostat

Thank you for your attention! Any questions? 22 Eurostat