Restricted An SDMX based unified data catalogue UDC
Restricted An SDMX based unified data catalogue (UDC) Gabriele Becker / Massimo Bruschi Statistical Information Systems Monetary & Economic Department Bank for International Settlements MSIS – Meeting on the Management of Statistical Information Systems Daejeon, 26 -29 April 2010 1
Restricted The SDMX vision l Need: up-to-date numbers, data documentation, good quality data l Data can be offered by: NSOs, CBs, IOs l How to choose, filter out duplication, get the “fresher” ? l Data providers (originators) offer their data “in SDMX” l Dissemination = reporting = data sharing… single storage ! l SDMX registries help users and organisations to find data l How “real” is this SDMX vision? l What do we still need to learn? 2
Restricted The Unified Data Catalogue (UDC) concept l Can we “implement” the vision ? l UDC: a single data catalogue that allows to discover, select and retrieve statistical data from all registered data sources l discovery implies access to metadata: • DSD – data structure definitions • concepts and code-lists • category schemes l An SDMX registry is a natural repository l Unified Data Catalogue feasibility study to analyse this 3
Restricted UDC study: Objectives l Provide centralised access to a variety of internal and external data-sources l Generic search facilities against “registered” data sources l Directly retrieve data and metadata from all data sources l Use SDMX technical standards, SDMX registry, web services l Broaden SDMX knowledge within BIS (business area and IT colleagues) 4
Restricted User stories l Registrations l Constraints l GUI features l Navigation / Search l Query & retrieval l Output handling l Automation l Security 5
Restricted UDC prototype architecture l Simplistic approach: to search and retrieve data from a data source all what we need to know are the data structures and the source query language l If a source follows the SDMX-IM we also need a (web) service connected to it able to respond to SDMX Query l SDMX-enabled data source: “native” or “adaptable” l SDMX-ML file + DSD + “file-query-handler” = simplest SDMX enabled source 6
Plan: schematic architecture Restricted SDMX files SDMX data source web-service mappable data source Internal or external sources web service SDMX query adapter web service Registrations SDMX Registry web appl. SDMX UDC GUI 7
Restricted Components of the UDC prototype l SDMX Registry (“off the shelf” SDMX Tool) • Data structure definitions of all “connected” data sources • Registrations for all data flows for all connected data sources • URLs to SDMX-files and SDMX query services • Updated via SDMX-ML messages or interactively (“Key. Master”) l UDC (developed for the study) • GUI to navigate the registry information • Queries the data sources • Retrieves data and presents them to the user l SDMX query web services (developed for the study) • For the different types of data sources l Data query services (partly existing, partly developed) • For each of the connected queryable data sources 8
What we did: detailed architecture Restricted medts. a Linux BIS Data Bank DBQL output mstat. a Win MSTAT Cubes mstat. s Win Mark. IT SQL database SDMX-ML proxy daemon TS web service SQL stored procedures SDMX-ML query web service /databank/query SDMX-ML query web service /mstat/query SDMX-ML query web service /markit/query SDMX Registry web appl. v. ds 03 Linux. xml SDMX-ML data files . xml UDC web appl. R/O Registry Internet Explorer UDC GUI SDMX-ML file browser PC Win 9
Restricted UDC GUI key features l Browse the Categories / Data-flows / Provision registrations l Browse selected DSD: dimensions, attributes, code-lists l Build queries based on DSD (code selection) l Run query and view results (simple table) l Download results and DSDs in SDMX-ML format l Search by Concept / Codelist 10
Search by Concept/Codelist - 1 Restricted 1 2 3 11
Search by Concept/Codelist - 2 Restricted 4 5 6 12
Restricted UDC Prototype: some results l UDC can provide (unsecured) access to • BIS Data Bank: time series repository, SDMX-EDI IM, LINUX, FAME, Sybase, own query language + query adapter • MSTAT OLAP: IBFS data multi-dimensional cubes, MS Windows, SQL Server, SDMX Query to OLAP / MDX adapter • MSTAT Sandbox, research data in relational base, MS Windows, SQL Server, DSD on unstructured dataset + SMDX / SQL adapter • SDMX-ML generic files + generic file adapter l Practical use of registration, provisioning, constraints processing, … l SDMX vision is real … with some practical issues 13
Restricted Issues found (Aug. 2009, SDMX 2. 0) l Not possible to register compact or utility files in registry l l l l used Not possible to register files using message groups and annotations as not supported in registry used Missing functionality in SDMX Query message Some issues with registry implementation used Constraints processing on registry did not work ECB does not provide DSDs on their website (files are OK) Cross-platform communication with security not solved In general: access authorisation to query-able data sources is unresolved 14
Restricted Conclusions l SDMX vision is real: the UDC works l Enhancements to standards already part of SDMX 2. 1 l Enhancements to registry implementation (eg industrial strength required) l Non-SDMX issues (cross-platform connectivity and access authentication) exist and need to be looked into l Current SDMX offerings from other organisations are rather diverse (message types, features used, version implemented) l Diverse offerings make requirements for a UDC more complex 15
Restricted Next steps for the BIS l UDC can be a central part of future BIS environment l Road to UDC will take a few years l Continue the feasibility study in the next year l Refine UDC • More data sources • More user facilities for search and navigation l Work with SDMX standards experts on issues found l Work with other SDMX data providers 16
Restricted Thank you! gabriele. becker@bis. org massimo. bruschi@bis. org 17
- Slides: 17