Combining Metadata Standards Approaches and Benefits Arofan Gregory
Combining Metadata Standards: Approaches and Benefits Arofan Gregory Open Data Foundation
Overview • Recent events of interest • The Standards: Comparison and Explanation • Emerging Implementation Approaches – DDI and SDMX – SDMX and the Semantic Web Technologies – Classifications & Multiple Standards • Ideas about Future Work
Recent Events of Interest Note: Some of these events/implementations have been or will be described in detail in other papers – they are only mentioned here. • Schloss Dagstuhl, Germany, November 2009 (DDI 3 Workshop) – SDMX 2. 0 – DDI 3 field-level mapping work started – Topic: DDI and the Semantic Web? ? ?
Recent Events of Interest (2) • Semantic Web and SDMX – ONS hosted 2 -day meeting in the UK, February 2009 (produced draft “SDMX-RDF”) – Banca d’Italia has a prototype project – New project launched at University of Tillburg in the Netherlands (RDF expression of OECD SDMX data) • Australian Bureau of Statistics (ABS) starts looking at SDMX and DDI to support data production lifecycle – Prototype implementations – Some other NSIs also very interested
Recent Events of Interest (3) • Classifications and ISO/IEC 11179 – Australia: Government agencies looking to exchange classifications with ABS from existing ISO/IEC 11179 system, using SDMX, DDI – Statistics Canada: Evaluation of IMDB (ISO/IEC 11179 -based metadata repository) for use in coordination with Canadian RDC Network (based on DDI 3)
What Does This Mean? • Not a complete list of events/implementations, but… • Indicates the interest we are seeing in the combined use of standards! – These are not just experiments! – Organizations are looking at implementation in a serious way now
Characterizing the Standards • SDMX: – – Data structures and formats Reference metadata structures and formats Web-services architecture based on registry services Content-oriented gudelines • ISO/IEC 11179: – Model for managing concepts and data elements – Metadata registries and lifecycle • ISO 19115: – Standard metadata model for geographies – Used by DDI as geographical model
Characterizing the Standards (2) • Dublin Core: – Citation metadata – Widely used in the Semantic Web – Used natively by DDI for citations • Semantic Web/ “Linked Data” / RDF – See “Open Issues on the Semantic Web” • DDI 3: – Will give more detail, as it is not as familiar to the METIS community…
Characterizing the Standards (3) • DDI 1. */2. * was a standard used by archives and data libraries – Based on a “codebook” model – Used by some NSIs, especially in the developing world because of the IHSN Metadata Management Toolkit – Used by the European network of data archives, CESSDA – Used by many data archives in North America • Documentation of a single “Study” (survey) – Designed to help researchers find and use microdata • DDI 3 is more ambitious – capture and use of metadata throughout the entire data lifecycle
DDI 3 Lifecycle Model Notice: This is very like a high-level view of the METIS model!
Characterizing the Standards (4) • DDI 3 provides machine-actionable metadata to support “metadata-driven” systems throughout the lifecycle – Focus is on upstream metadata capture and reuse • Describes tabulation/aggregation of microdata • Provides support for comparison across surveys, detailed geography, data processing, register data • Aggregate “NCube” model aligned with SDMX • No architecture/web services support (yet)
An Observation… • It is easy to say that two standards are “aligned” – Many of these standards were intentionally aligned as they were developed • It is much more difficult to understand how to use them in combination effectively…
Approaches and Benefits • SDMX and DDI – DDI microdata production/SDMX aggregate dissemination – Using SDMX data in DDI-based systems (combining aggregates and microdata) – Combined SDMX/DDI supporting the entire data lifecycle – DDI register data reported to SDMX collection system • SDMX and the Semantic Web • Classifications and the Standards
DDI 3 Metadata Surveys Registers Input data Dissemination data Cleaning, editing, estimation, aggregation, etc. Website/Web Service SDMX-ML Data, Metadata, Structure
DDI – SDMX: Benefits • The benefits of this approach are those found by using the standards generally – Supports “metadata-driven” system for data production throughout the lifecycle (DDI) – Metadata-rich dissemination format, preferred by data collectors (SDMX) – Shared tools; SDMX registry services, Web Services for discovery and use of aggregates
SDMX – DDI: Integrating Aggregates and Microdata • Scenario is common in some research – Economic data is often only available as aggregates – Challenge is to combine aggregates and other microdata
SDMX Web Service SDMX-to-DDI 3 Transform Surveys Data archive/ repository (DDI 3) Registers (DDI 3) Processing to produce Integrated data and Metadata (DDI 3)
SDMX – DDI: Benefits • Allows for easy use of official statistics by researchers – Solves problems of combining aggregates and microdata • Note: This does not involve disaggregation of published data – Structural transformation only, to allow DDI 3 systems to process aggregates easily
DDI + SDMX: The Data Lifecycle • Uses a metadata model capable of expression as either SDMX or DDI, depending • Provides support for process management – Uses many features of SDMX (process model, structure sets, reporting taxonomies, etc. ) • Uses SDMX architecture/services model – Designed to allow incorporation of other standards
Process-management system (BPML) (SDMX) Input data store Surveys (DDI 3) All registry interactions use SDMX Dissemination data store SDMX Registry Web site/ Print/ Web Services Registers (DDI 3) Interactions between systems are DDI or SDMX Web Services, as appropriate Data and metadata repositories/ application databases (SDMX, DDI, etc. )
SDMX + DDI: Benefits • Leverages Web-Services technologies (registry, event triggers, etc. ) for efficient automation, migration, flexibility • Choice of tools is broad – Use the “best” format for any given task • All the benefits of DDI-SDMX case • Good support for process management as well as data management
SDMX and the Semantic Web Technologies • Potentially applies to other standards as well (DDI, ISO/IEC 11179, etc. ) • Note that Semantic Web technologies only apply to dissemination – Not designed to support data production • Terms: – “Raw data” in an SW context does not mean “raw data” – “Data” in an SW context means “anything that can be described using RDF” – not numeric data
Assumptions • Creation of a harmonized statistical model based on proven models/standards, but expressed as RDF (“ontology” or “vocabulary” in SW terms) • Implementation of an “SDMX-RDF” in standard SDMX dissemination packages
Internal (production environment) “SDMX-RDF” Transform External (dissemination to Web) Triplestore (SDMXRDF) (SPARQL Queries) (RDF) (SDMX-driven production system) Dissemination data store (SDMX) SDMX Web Service (SDMX-ML)
SDMX and the Semantic Web: Benefits • Leverages the “Linked Data” phenomenon without requiring a deep understanding of RDF, etc. • Uses existing standards/models and best practices to do “heavy lifting” (data production) • Puts a lot of reliable, quality data into the “Linked Data Web” – Helps address issues of provenance
Warning • RDF is verbose! • 4. 5 Megs of GESMES/TS = 45 Megs of “compact” SDMX-ML XML = 420 Megs of RDF triples • This may encourage the on-demand production of RDF data from web services, rather than static files
Standards and Classifications • Some maintainers of standard classifications are looking at expressing them in useful formats (SDMX, DDI) – This is an easy thing to do – It is very useful: promotes re-use, comparability, etc. – Could apply to Semantic Web RDF expressions as well as XML-based standards
Ideas for Future Work • Endorse SDMX – DDI mappings now being produced • Develop an “SDMX-RDF” (? ) or… • Develop a harmonized statistical model for expression in RDF (based on DDI, SDMX, ISO/IEC 11179) (? ) – Encourage tools developers to implement it in standard dissemination packages • Publish standard classifications in standard formats
Summary • Combined use of standards is becoming a reality • Proactive engagement with the Semantic Web world could provide benefits to all concerned parties, as well as users
- Slides: 29