Eurostat EDIT 2012 Functional Presentation Eurostat EDIT Introduction

  • Slides: 34
Download presentation
Eurostat EDIT 2012 Functional Presentation Eurostat

Eurostat EDIT 2012 Functional Presentation Eurostat

EDIT Introduction EDIT allows users to import data, perform a set of predefined operations

EDIT Introduction EDIT allows users to import data, perform a set of predefined operations on the imported datasets and export data resulted from these processing operations. Validations / Computations Ø Record Ø Vertical Ø Hierarchical Dataset Operations Ø Copy, Merge, Alter, Aggregate, etc see Scripting manual. Eurostat

EDIT Introduction (2) Data Validation tool, allowing users to import data, run validation programs

EDIT Introduction (2) Data Validation tool, allowing users to import data, run validation programs and export results The validation process relies on a custom Scripting Language Web-based User Interface Data and Metadata isolated into independent Domains Eurostat

Specialised functions Time series outliers (Terror) Berthelot-Hidiroglu Sigma Gap Programmable functions Eurostat

Specialised functions Time series outliers (Terror) Berthelot-Hidiroglu Sigma Gap Programmable functions Eurostat

Technology Overview Web Based Interface Ø Unified interface for both the local version and

Technology Overview Web Based Interface Ø Unified interface for both the local version and the server deployment Ø EUROSTAT Look & Feel Ø Light interface, simplified workflows RDBMS (Oracle and Postgre. SQL) ECAS or local authentication ( end year: SMS) Eurostat

EDIT Integration Capabilities Exposes full API as Web Services Integrated with EDAMIS Ø detect

EDIT Integration Capabilities Exposes full API as Web Services Integrated with EDAMIS Ø detect incoming files and process them in unattended mode Ø publish validation results to the Feedback Channel Integrated with the SDMX Registry Ø fetch DSDs into EDIT structures Ø load codelists from the Registry Eurostat

EDAMIS Integration EDAMIS can send data to EDIT by placing the files in a

EDAMIS Integration EDAMIS can send data to EDIT by placing the files in a configurable location EDIT detects metadata based on the EDAMIS naming convention EDIT performs the processing in unattended mode EDIT acts as a client for the EDAMIS Feedback Channel Web Service in order to publish the results of a job execution Eurostat

SDMX Registry Integration EDIT can import DSDs or codelists EDIT acts as a client

SDMX Registry Integration EDIT can import DSDs or codelists EDIT acts as a client for the SDMX Registry Web Services in order to fetch DSD files and codelists data The DSD file is broken down into EDIT components Ø Key families are translated to EDIT formats Ø Codelists are translated to EDIT Lookups Ø An EDIT Program is created performing lookup validations and basic checks on the dimension fields A specific importer has been implemented to process codelist data Eurostat

EVE Integration EVE Rules can be imported into EDIT or executed from an external

EVE Integration EVE Rules can be imported into EDIT or executed from an external file during the EDIT Job Execution An EDIT component can translate EVE rules defined in XML files to EDIT Scripting Language Eurostat

Important principles 1) From Microdata to macrodata 2) Scripting principle (symbols/placeholders) 3) Editing seen

Important principles 1) From Microdata to macrodata 2) Scripting principle (symbols/placeholders) 3) Editing seen as a case of complex computations 4) Multidataset approach 5) Cube approach in computations Eurostat

Rule layout § Rule name § Rule type § Rule body § Error part

Rule layout § Rule name § Rule type § Rule body § Error part (msg, selected vars) § Then compute part § Else compute part Eurostat

EDIT Scripting Language Capabilities § Custom Scripting Language designed specifically for data validation §

EDIT Scripting Language Capabilities § Custom Scripting Language designed specifically for data validation § Tries to be as simple as possible and still flexible enough to fit the requirements of any existing domain § Allows the definition of Formats and validation Programs § Formats (Dataset Definitions) describe the structure of the data (Format Definition Language) § Validation Programs describe the validation rules and are composed from a set of steps with inputs and outputs (Program Definition Language) Eurostat

EDIT Standalone Installation § Standalone Installation supported for Windows XP and Windows 7 §

EDIT Standalone Installation § Standalone Installation supported for Windows XP and Windows 7 § Simple installation wizard § Shortcuts are created in the Start Menu Eurostat

EDIT User Types § User Ø Executes jobs on datasets § Programmer Ø Manages

EDIT User Types § User Ø Executes jobs on datasets § Programmer Ø Manages the Metadata needed by the User to execute jobs Ø Sets up the unattended mode configuration § Administrator Ø Manages users and permissions Eurostat

User Module Functionality § Change Password Ø Change the password of the user(when not

User Module Functionality § Change Password Ø Change the password of the user(when not logged through ECAS) § Dataset Import/Export Ø Import and export data to and from the System Ø Monitor any ongoing import/export processes § Job Execution Ø Execute validation programs on imported datasets Ø View the results of a Job Execution Eurostat

User Workflow Data Import Job Executio n Job Results Eurostat Data Export

User Workflow Data Import Job Executio n Job Results Eurostat Data Export

Programming Module Main Functional Capabilities § Formats and Programs Definition Ø Define Metadata using

Programming Module Main Functional Capabilities § Formats and Programs Definition Ø Define Metadata using the editors in the User Interface Ø Import/Export Metadata from/to external TXT files Ø Import/Export to Oracle § Data Import/Export Ø Import Auxiliary Data (lookup datasets) § Job Execution Ø Execute validation programs on imported datasets Eurostat

Programming Module Workflows Format Definition Program Definition Eurostat Auxiliary Data Import

Programming Module Workflows Format Definition Program Definition Eurostat Auxiliary Data Import

Programming Module - Format Definition Write FDL Script into a Text File Open EDIT

Programming Module - Format Definition Write FDL Script into a Text File Open EDIT Client Import FDL File using the Format Import Functionality Access the Format Editor Functionality Define the Format using the Editor Eurostat

Programming Module – Program Definition Write PDL Script into a Text File Open EDIT

Programming Module – Program Definition Write PDL Script into a Text File Open EDIT Client Access the Program Editor Functionality Eurostat Import PDL File using the Program Import Functionality Define the Program using the Editor

Programming Module – Import Auxiliary Data Open EDIT Client Access Dataset Import Functionality Eurostat

Programming Module – Import Auxiliary Data Open EDIT Client Access Dataset Import Functionality Eurostat Define and Execute Import Process

Programming Module – Unattended Mode Configuration Configure Incoming Data Locations Create Metadata and Import

Programming Module – Unattended Mode Configuration Configure Incoming Data Locations Create Metadata and Import Templates Eurostat Configure Dynamic Domain Programs

Administration Module Main Functional Capabilities § User Management Ø Manage the Users and their

Administration Module Main Functional Capabilities § User Management Ø Manage the Users and their permissions § User Group Management Ø Manage the User Groups and their members § Domain Management Ø Manage the Domains Eurostat

Dataset format CVTS FORMAT cvts_4 { DESCRIPTION "CVTS Format"; FIELDS { COUNTRY { DESCRIPTION

Dataset format CVTS FORMAT cvts_4 { DESCRIPTION "CVTS Format"; FIELDS { COUNTRY { DESCRIPTION "None"; CAPTION "None"; TYPE STRING; LENGTH 2; } ENTERPR { DESCRIPTION "None"; CAPTION "None"; TYPE NUMBER; LENGTH 6; } REFYEAR { DESCRIPTION "None"; CAPTION "None"; TYPE NUMBER; LENGTH 4; } WEIGHT { DESCRIPTION "None"; CAPTION "None"; TYPE DOUBLE; LENGTH 20. 10; } NACE_SP { DESCRIPTION "None"; CAPTION "None"; TYPE STRING; LENGTH 5; Eurostat

Program CVTS annex 8 PROGRAM cvts_4 { INPUT cvts_4 input. Data. Set; <= all

Program CVTS annex 8 PROGRAM cvts_4 { INPUT cvts_4 input. Data. Set; <= all input datasets we use for the validation INPUT LANGUAGES_LIST LANGUAGES; INPUT COUNTRIES_LIST COUNTRIES; INPUT NUTS_LIST NUTS; INPUT NACE_LIST NACE; STEPS { <= can be multi-step program (for example separately ERRORS. . WARNINGS ) VALIDATION annex 8_error { INPUT input. Data. Set; <= main dataset being validated LOOKUP LANGUAGES; LOOKUP COUNTRIES; <= lookup tables LOOKUP NUTS; LOOKUP NACE; ERROR err_annex 8_error; <= output log - error dataset RULES { RECORD FL 001 { CONDITION in. Lookup (COUNTRY, COUNTRIES, "CODE"); <= check validity of the COUNTRY code using lookup table ERRMSG "Rule FL 1 failed for field [COUNTRY]: See EU Manual for valid list of codes (annex 12)" SEVERITY "Error" (COUNTRY) ; } RECORD FL 002 { CONDITION (ENTERPR>=0 AND ENTERPR<=999996); ERRMSG "Rule FL 2 failed for field [ENTERPR]: In the range 0 to 999996" SEVERITY "Error" (ENTERPR) ; } RECORD FL 004 { CONDITION (str. To. Double (NACE_SP)>=2001 AND str. To. Double (NACE_SP)<=2020); ERRMSG "Rule FL 4 failed for field [NACE_SP]: In the range 2001 to 2020 - See Manual (annex 1)" SEVERITY "Error" (NACE_SP) ; } RECORD FL 005 { CONDITION in (SIZE_SP, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9); ERRMSG "Rule FL 5 failed for field [SIZE_SP]: In the range 0 to 9" SEVERITY "Error" (SIZE_SP) ; } RECORD FL 171 { CONDITION (NOT is. Null (A 1 bis)) -> in. Lookup (A 1 bis, NACE, "CODE"); ERRMSG "Rule FL 171 failed for field [A 1 bis]: NACE rev 1. 1" SEVERITY "Warning" (A 1 bis) ; } RECORD FL 172 { CONDITION (NOT is. Null (A 2 bis)) -> ( (A 2 bis >=0 and A 2 bis <= 999996) or A 2 bis = 999999); ERRMSG "Rule FL 172 failed for field [A 2 bis]: In the range of 0 -999996 or 999999" SEVERITY "Warning" (A 2 bis) ; } Eurostat

Complex program example (1) PROGRAM Computation. Examples { INPUT country. Dsd input. Data; STEPS

Complex program example (1) PROGRAM Computation. Examples { INPUT country. Dsd input. Data; STEPS { VALIDATION check. Values { INPUT input. Data; ERROR error. Data 1; RULES { RECORD pure. Record { PRICE : = 20; } RECORD conditional. Record { CONDITION is. Null(VALUE); THEN { VALUE : = PRICE * QUANTITY; } ELSE { PRICE : = VALUE / 5; QUANTITY : = VALUE / PRICE; } } VERTICAL pure. Vertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY; VALUE['TOTAL'] : = nvl(VALUE['TOTAL'], 0); } } Eurostat

Complex program example (2) VERTICAL conditional. Vertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH,

Complex program example (2) VERTICAL conditional. Vertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY; CONDITION 2 * VALUE['TOTAL'] = sum(VALUE[*]) ; THEN { VALUE['FR'] : = VALUE['TOTAL'] / 3; VALUE['GB'] : = VALUE['TOTAL'] / 2; VALUE['TOTAL'] : = sum(VALUE[*]) - VALUE['TOTAL']; } ELSE { VALUE['TOTAL'] : = sum(VALUE[*]) - VALUE['TOTAL']; } } } VERTICAL multiple. Transpositions. Computation { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY, PRODUCT; VALUE['TOTAL']['GAS'] : = nvl(VALUE['TOTAL']['GAS'], 0); } } Eurostat

Complex program example (3) VERTICAL multiple. Transpositions. Condition { EXPRESSION { KEYS COUNTRY, CTYPE,

Complex program example (3) VERTICAL multiple. Transpositions. Condition { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY, PRODUCT; CONDITION VALUE['TOTAL']['GAS'] > 5000 ; THEN { VALUE['TOTAL']['GAS'] : = VALUE['TOTAL']['OIL'] * 2; VALUE['TOTAL']['GLD'] : = VALUE['TOTAL']['OIL'] * 5; } ELSE { VALUE['TOTAL']['GLD'] : = VALUE['TOTAL']['GAS'] + 3000; } } } DATAOPERATION sort. Data { SORT { INPUT input. Data; ORDER MONTH ASC, CTYPE ASC, COUNTRY ASC, PRODUCT ASC ; } } Eurostat

Accepted data formats Gesmes / BOP ITS, BOP FDI UNA: +. ? ' UNB+UNOC:

Accepted data formats Gesmes / BOP ITS, BOP FDI UNA: +. ? ' UNB+UNOC: 3+FR 2+4 D 0+100929: 1637+IREF 000243++GESMES/TS' UNH+MREF 000001+GESMES: 2: 1: E 6' 9 H; 2008; LT; 2; B-N_X_K 642; 11930; 16236; ; ; UNIT; ; ; TT 0; BGM+74' NAD+Z 02+ECB' 9 H; 2008; LT; 3; B-N_X_K 642; 11930; 1001; ; ; UNIT; ; ; TT; ; ; NAD+MR+4 D 0' 9 H; 2008; LT; 4; B-N_X_K 642; 11930; 529; ; ; UNIT; ; ; TT; ; NAD+MS+FR 2' 9 H; 2008; LT; 30; B-N_X_K 642; 11930; 17766; ; ; UNIT; ; ; TT; IDE+10+EUROSTAT_BOP_01 reporting' DSI+BOP_FDI_A' 9 H; 2008; LT; 2; B-E; 11930; 1138; ; ; UNIT; ; ; TT; ; ; D 08 STS+3+7' 9 H; 2008; LT; 3; B-E; 11930; 104; ; ; UNIT; ; ; TT; ; ; D 08 DTM+242: 201009291637: 203' 9 H; 2008; LT; 4; B-E; 11930; 61; ; ; UNIT; ; ; TT; ; ; D 08 DTM+Z 02: 20072009: 702' IDE+5+EUROSTAT_BOP_01' GIS+AR 3' GIS+1: : : -' ARR++A: FR: N: 2: 330: N: 4 A: E: 9999: 20072009: 702: 0: A: F+0: A: F‘ <= multi-year 2007, 2008, 2009 observations ARR++A: FR: N: 2: 330: N: 4 F: E: 9999: 20072009: 702: 0: A: F+0: A: F' ARR++A: FR: N: 2: 330: N: 7 Z: E: 9999: 20072009: 702: 0: A: F+0: A: F' ARR++A: FR: N: 2: 330: N: A 1: E: 1100: 9999: 20072009: 702: 5824: A: F+5930: A: F+4204: A: F' ARR++A: FR: N: 2: 330: N: A 1: E: 1495: 9999: 20072009: 702: 5828: A: F+5932: A: F+4206: A: F' CSV (with or without header) FLR example 001 E 20100121814 00 001 E 20100121816 93 001 E 20100121814 99 001 E 20100125290334 FLR example 2 2010010011 /SBS, CVTS, TOURISM 804. 822 5295. 549 6166. 24 581. 371 010252000405595911005909580 E 010252000405595911004009600 E 01 ZZZZZ Eurostat 2691. 966 237. 543 2734482. 0 341202. 0 0. 0 ; ; D 08 ; ; D 08

Program with parameter(s) PROGRAM SBS_ANNEX 1_Single. Series { INPUT SBS_DATA input 1; PARAMETER P_T

Program with parameter(s) PROGRAM SBS_ANNEX 1_Single. Series { INPUT SBS_DATA input 1; PARAMETER P_T NUMBER; PARAMETERSET PARAMETERS { P_T = 2009; } STEPS { VALIDATION Validation { INPUT input 1; ERROR Error. Log; RULES { VERTICAL Rule 001 { EXPRESSION { KEYS SERIES, YEAR, TER_UNIT, SIZECLASS, ECO_ACTIVITY, VARIABLE; TRKEYS VARIABLE; CONDITION SERIES = '1 A' AND YEAR = p_T AND count. Missing(aux_val['12150'], aux_val['12110'])=0 -> aux_val['12150'] <= aux_val['12110']; ERRMSG '12150 > 12110' SEVERITY 'Warning' (aux_val['12150'], aux_val['12110']) ; } } Eurostat

Functions, data types, operators (1) Functions Data Types There are four types of data:

Functions, data types, operators (1) Functions Data Types There are four types of data: · Boolean · Double · Number · String These following function calls are supported: A · abs(Double) – absolute value · ascii(Char) – returns the ASCII code for a character B · between(Double, Double) – verifies if a number is inside a closed interval · between(Double, Boolean) – verifies if a number is inside an interval allowing the user to specify if the interval is closed or open at each end · between(String, String) – same as above · between(String, Boolean) – same as above C · ceiling(Double) – ceiling for number (Ex: ceiling(3. 2) => 4) · chr(Integer) – returns the ASCII character for the ASCII code · concatenate(String…) – concatenate Strings · count. Missing(List) – returns the number of null values in the list · count(List) – returns the number of elements in the list E · exp(Double) F · first. Index. Of(String to. Search, String search. In, Double starting. From) – first occurance of the to. Search String in the search. In String · floor(Double) – floor for number (Ex: ceiling(3. 2) => 3) G · get. Row. Count(dataset. Reference) – returns the number of rows for the specified dataset reference Operators There are three types of operators in the SL Expressions: Boolean operators, used to evaluate expressions into a true/false result: o OR o AND o NOT o -> (implication) o = (equals) o <> (not equals) o > (greater than) o < (lower than) o >= (greater than or equal to) o <= (lower than or equal to) · Computation operators, used to produce a value result following evaluation: o + (plus) o – (minus) o * (multiply) o / (divide) · Assignment operator, used to assign a value to an operand o : = (supports assignment to NULL, value : = NULL) Eurostat

Functions, data types, operators (2) I · · · · · N · in(Boolean,

Functions, data types, operators (2) I · · · · · N · in(Boolean, Boolean List) – check to see if a value is inside a list in(Double, Double List) in(String, String List) identical. In. List(List) – checks that all elements in a list are identical is. Identical(List, …) – returns a boolean indicating if each list contains identical elements (elements are identical inside a single list) – Ex: is. Identical(price[*], quantity[*], value[*]) is. Unique(List, …) – returns a boolean indicating if the combination of elements from each list for all the index positions are unique– Ex: is. Unique(price[*], quantity[*], value[*]) is. True(Boolean) – checks if a boolean is true is. Null(Boolean/String/Double) – checks if a value is NULL in. Lookup(value 1, value 2, …, dataset. Reference, “field. Name 1”, “field. Name 2”, … ) – returns a boolean indicating whether or not the value or combination of values is defined for the fieldname or combination of fieldnames in the specified dataset · · O · · · printf(String, String/Double…) – offers the capabilities of the printf method pow(Double N, Double n) – N**n R · · · right(String S, Double N) – returns the last N characters from S round(Double N, Double n) – round N till n decimals (Ex: round(4. 46, 1) => 4. 5) rtrim(String) – trim right side of String S · last. Index. Of(String to. Search, String search. In, Double starting. From) – last occurance of the to. Search String in the search. In String length(String) – returns the length of the String like(String, String) – compares two strings in a SQL manner log(Double) ln(Double) ltrim(String) – trim left side of String lower(String) – switch to lower case left(String S, Double N) – returns the first N characters from S · · str(Double N 1, Double N 2, Double N 3) – Ex: str(30. 25, 7, 3)= “_BLANK_30. 250“ str. To. Double(String) – convert a String into a Double – return null if String cannot be converted substring(String, Double N, Double n) – substring starting from N, counting n characters sum(Double List) – sum of elements from list sqrt(Double) – returns the square root of the value T · · M · · occurs(String S, String s) – returns the number of occurences of s in S P L · nvl(Boolean, Boolean) – if first argument value is null return second argument nvl(Double, Double) nvl(String, String) max(Double List) – maximum value from a list min(Double List) – minimum value from a list mean(Double List) – computes the average value not counting null values missing. Mean(Double List) – computes the average value counting null values as zero mod(Double N, Double n) – N%n trim(String) – trim String transcode (“target. Field”, lookup. Reference, “lookup. Field”, lookup. Value) – performs a lookup based on the specified field and value and returns the value of the target field on the matching row U · · Eurostat upper(String) – switch to upper case unique. In. List(List) – checks if a list contains unique values

2013 functionalities § Scalability improvements § Gesmes full integration § Internationalisation and interface improvements

2013 functionalities § Scalability improvements § Gesmes full integration § Internationalisation and interface improvements Eurostat

THANK YOU FOR YOUR ATTENTION EDIT? Hmm Eurostat

THANK YOU FOR YOUR ATTENTION EDIT? Hmm Eurostat