Leveraging the DDI Model for Linked Statistical Data


























- Slides: 26
Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences DC 2012 05. 09. 2012 Thomas Bosch GESIS – Leibniz Institute for the Social Sciences, Germany thomas. bosch@gesis. org Richard Cyganiak Digital Enterprise Research Institute, Ireland richard@cyganiak. de Joachim Wackerow GESIS – Leibniz Institute for the Social Sciences, Germany joachim. wackerow@gesis. org Benjamin Zapilko GESIS – Leibniz Institute for the Social Sciences, Germany benjamin. zapilko@gesis. org
Agenda 2
What is DDI? • DDI (Data Documentation Initiative) • Established international standard for the documentation and management of data from the social, behavioral, and economic sciences • Data model for statistical data • Supports the entire research data lifecycle • Focus on microdata • Structured high quality metadata • enable secondary analysis without the need to contact the primary researcher • Enables the re-use of metadata of existing studies for designing new studies • Currently specified in XML Schema 3
How was the DDI Ontology developed? • DDI subset • of the most important DDI elements • Use cases • Experts in the statistics domain formulated use cases which are seen as most significant to solve frequent problems • Most important use case: discover microdata connected with multiple studies • Leverage existing DDI-XML docs to DDI-RDF automatically • Direct mapping • Generic mapping (Bosch and Mathiak, 2011) 4
Why DDI as Linked Data? • Currently no such ontology available • To increase visibility of data holdings using mainstream Web technologies • To open DDI to the Linked Data community • To process DDI-RDF by RDF tools • To link DDI-RDF to other RDF data • To better identify opportunities for merging datasets • To enable inferencing • To research microdata within the LOD cloud 5
What other metadata standards vocabularies are used? • • • Dublin Core Metadata Element Set, Version 1. 1 DCMI Metadata Terms SKOS SDMX RDF Data Cube Vocabulary ISO/IEC 11179 ISO 19115 6
Discovery Use Case • • • Which studies are connected with a specific coverage consisting of the 3 dimensions: time, country, and subject? What questions with a specific question text are contained in the study questionnaire? What questions are connected with a concept with a specific label? What questions are combined with a variable with an associated coverage consisting of the 3 dimensions time, country, and subject? What concepts are linked to particular variables or questions? What representation does a specific variable have? What codes and what categories are part of this representation? What variable label does a variable with a particular variable name have? What‘s the maximum value of a certain variable? What are the absolute and relative frequencies of a specific code? What data files contain the entire dataset? 7
8
study | coverage 9
10
instrument | question | concept 11
12
13
values | value labels 14
15
16
variable | descriptive statistics 17
18
19
logical dataset | data file 20
21
22
conceptual model 23
24
Acknowledgements • • • • Archana Bidargaddi (NSD - Norwegian Social Science Data Services, Norway) Franck Cotton (INSEE - Institut National de la Statistique et des Études Économiques, France) Richard Cyganiak (DERI - Digital Enterprise Research Institute, Ireland) Daniel Gilman (BLS - Bureau of Labor Statistics, USA) Marcel Hebing (SOEP - German Socio-Economic Panel Study, Germany) Larry Hoyle (University of Kansas, USA) Jannik Jensen (DDA - Danish Data Archive, Denmark) Stefan Kramer (CISER - Cornell Institute for Social and Economic Research, USA) Amber Leahey (Scholars Portal Project - University of Toronto, Canada) Abdul Rahim (Metadata Technologies Inc. , USA) John Shepherdson (UK Data Archive, UK) Dan Smith (Algenta Technologies Inc. , USA) Humphrey Southall (Department of Geography, UK Portsmouth University, UK) Wendy Thomas (MPC - Minnesota Population Center, USA) Johanna Vompras (University Bielefeld Library, Germany) 25
Thank you for you attention! 26