Practical use cases of SDMX Census Hub Nadezhda
Practical use cases of SDMX: Census Hub Nadezhda Vlahova Eurostat Unit B 3: “IT for statistical production” SDMX Basic course, 2016
The European Census Hub: key issues q Dissemination of the data from the 2011 population and housing censuses in the European Union q Data that are methodologically comparable and structured according to “hypercubes” agreed with Member States (Census Regulation) q Providing users with an easy access to detailed census data and metadata (advanced functionalities) q Management of massive amounts of data produced and controlled by Member States q Maximum flexibility to cross-tabulate data from different sources
EU Census: Implementing measures q Regulation (EC) 763/2008 on population and housing censuses authorises the European Commission to adopt implementing measures on: § technical specifications of the topics and their breakdown (Regulation (EC) 1201/2009) § programme of the statistical data and metadata to be transmitted to Eurostat (Regulation (EU) 519/2010) § quality reporting and technical format of data transmission (Regulation (EU) 1151/2010)
What we need? q Environment for dissemination of massive amounts of harmonised data which is easy to use and reuse q Possibility to compare and cross-tabulate topics of interest, on data collected from different sources q Countries Census data and metadata should be made available by 31 March 2014 and stored until 2025 q Single point to access and retrieve detailed census data, numerical and textual metadata q Standardized approach for similar data collections q No validation and aggregation on the fly are required, neither supported
The Hub concept q The Hub is based on the concept of data sharing, where a group of partners agree on providing access to their data according to standard processes, formats and technologies. q The SDMX Hub approach offers several advantages: • decoupling of NSIs' systems from the central hub via standard formats and techniques for the exchange; • Limited investment, re-usability (with the advantage of using recognized international standards). • Software (SDMX-RI) is supplied by Eurostat.
Hub approach – PULL data for collection and dissemination register SDMX Registry query NSI Hub P U L Eurostat Pull Requestor Dissemination Received data in Loader Eurobase SDMX-ML Dissemination L P e. DAMIS U S H Verification / Conversion XSL for To SDMX-ML Intermediate Data Input storage Warehouse storage
What is the Census Hub ? q Interoperability/Initiative (32 countries participating in the project) q Business driven project Delivered: q Information system to query and display SDMX formatted data (e. g. Census 2011 data) retrieved from different data providers – Census Hub q Universal framework for exposure and translation in SDMXML format of data stored in a legacy dissemination database – SDMX-Reference Infrastructure (SDMX-RI)
Prerequisite: SDMX compliance* (1/2) ü Agreed standard/format for data exchange ü Defined definitions, concepts, codelists, DSDs, transmission model and obligations Requires: q NSIs to create and transmit data according to: § agreed format § mode of data exchange § defined time period * Prerequisite for SDMX tools and SDMX exchange
SDMX compliance (2/2) Data preparation Statistical variables (topics) Macrodata Microdatabase Individual records Aggregated dataset n Non-SDMX local dissemination data Aggregated dataset 5 Aggregated dataset 1 Aggregated dataset 2 Aggregated dataset 3 Aggregated dataset 4
Example: DSD for Table 6 (Marital Status) Dimensions CONCEPT ID CODELIST TIME Time period or range CL_TIME GEO Geographical area CL_GEO SEX Sex CL_SEX FST Family status CL_FST LMS Legal marital status CL_LMS CAS Current activity status CL_CAS POB Country/place of birth CL_POB COC Country of citizenship CL_COC AGE Age CL_AGE FREQ Frequency CL_FREQ Attributes ID ATTACHMENT LEVEL CODELIST OBS_STATUS Observation CL_OBS_STATUS OBS_LEVEL Observation CL_OBS_LEVEL OBS_NOTE Observation HC_NOTE Series Measures ID OBS_VALUE NAME Observation value
Census datasets (hypercubes) Hypercube HC 06 GEO. L. SEX. FST. H. LMS. CAS. L. POB. M. COC. M. AGE. M. (57) (3) (17) (13) (6) (15) (13) (28) Theoretical number of cells = 57 * 3 * 17 * 13 * 6 * 15 * 13 * 28 = … = 1, 238, 033, 160 cells (NB: for one country and one table…)
Census datasets (hypercubes) Hypercube HC 04 GEO. L. SEX. HST. H. LOC. CAS. L. POB. L. COC. L. AGE. M. Using the naming conventions of Eurobase its name would be: Population in NUTS 2 regions by sex, five-years age groups, household status, size of the locality and broad groups of current activity status, place of birth and country of citizenship we need to change the way in which we give access to data
SDMX exchange and supporting tools NSI Process workflow SDMX codes Extract files Transform file Dissemina tion/Trans mission SDMX file SDMX Converter NSI software Non-SDMX local data NSI software SDMX Converter EDAMIS Processing for sending SDMX-RI Test client NSI client Mapping Assistant NSI development EDAMIS Processing for sending NSI Web service HUB EDAMIS NSI software Eurostat tools NSI developed software
SDMX-RI architecture overview Data providing organisation SDMX-RI – User Interfaces Mapping Data collecting organisation Web/Test Client Assistant Internal network Non-SDMX local data SDMX-RI – “Under the hood” SDMX-formatted data
Census Hub – Standard approach Data Provider = NSI SDMX (STANDARD, EXCHANGE, METADATA, REPOSITORY) DSD Mapping Assistant Web Client Test Client Non-SDMX local database Metadata repository WEB SERVICES Mapping store Data Collector DSD SDMX query SDMX response Census Hub
Data Hub is: q System of DSDs built in SDMX 2. 0 and in use for 32 countries of the ESS q Data dissemination portal based on SDMX data model § communicating with data providers via SDMX Web Service q No data processing (no editing or aggregation) q Additional reusable modules for § Configuration management § Tool for handling SDMX structural metadata q Innovative user interface -> extract data starting from a statistical concept q MSs status: 32 up and running
A data user can… q Browse the Hub to define a dataset of interest, navigating via structural metadata: § Search by topic (filters) and select data (level of detail, breakdowns) § Select layout (axes) q View a table q Export a file (CSV, Excel, SDMX-ML) q User registration Ø Registered users can: § Save, retrieve, modify or delete stored queries § Receive an e-mail notification when offline queries are executed
National Statistical Institute How the Hub works Eurostat Census Hub National Statistical Institute
https: //ec. europa. eu/Census. Hub 2/
Reusability q q q q Installed within and outside ESS Used for multiple domains Used for dissemination and reporting Supports data sharing, push and pull modes Generic and SDMX based (2. 0 and 2. 1) Extensible and modular approach Free, open source solution, maintained by Eurostat Support different platforms and DB vendors
For more information ESTAT-CENSUSHUB@ec. europa. eu https: //webgate. ec. europa. eu/fpfis/mwikis/sdmx/index. php/Census_ Hub
Thank you for your attention 22
DEMO
DEMO
- Slides: 29