Metadata in the social sciences Metadata schemas in

  • Slides: 12
Download presentation
Metadata in the social sciences

Metadata in the social sciences

Metadata schemas in the social sciences • Generic: • Dublin Core • DCAT •

Metadata schemas in the social sciences • Generic: • Dublin Core • DCAT • Schema. org (increasingly) • Specific • DDI: Data Documentation Initiative • REFI-QDA: Rotterdam Exchange Format Initiative for Qualitative Data Analysis software • TEI: Text Encoding Initiative • Cross-domain: • ISO 19115 – Geospatial • …

The Data Documentation Initiative • The Data Documentation Initiative (DDI) is an international standard

The Data Documentation Initiative • The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. • DDI is a free standard that can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving.

DDI Products • Current products: specifications • Controlled vocabularies • RDF vocabularies • Developing

DDI Products • Current products: specifications • Controlled vocabularies • RDF vocabularies • Developing products

Current products - specifications • DDI-Codebook: • DDI-Codebook is an XML structure for describing

Current products - specifications • DDI-Codebook: • DDI-Codebook is an XML structure for describing codebooks (or data dictionaries), for a single study. • DDI-Lifecycle: • DDI-Lifecycle (also expressed in XML) expands on the coverage of a single study along the data lifecycle • Can describe several waves of data collection, and even ad hoc collections of datasets grouped for the purposes of comparison. • It is very useful when dealing with serial data collection as is often seen in data production within statistical offices and long-standing research projects. • • For specifications (in XML) see: https: //ddialliance. org/explore-documentation Other XKOS – Extended Knowledge Organization System Controlled Vocabularies - The DDI Controlled Vocabularies are recommended sets of terms and definitions

Controlled vocabularies • The DDI Controlled Vocabularies are recommended sets of terms and definitions

Controlled vocabularies • The DDI Controlled Vocabularies are recommended sets of terms and definitions • https: //ddialliance. org/controlled-vocabularies • Available as html, xml and xls • Also increasingly available through related international collaborations: • CESSDA Vocabularies Service: https: //vocabularies. cessda. eu/ • Export as SKOS, PDF, HTML • Also accessible through API

RDF vocabularies • https: //ddialliance. org/Specification/RDF • XKOS - Extended Knowledge Organization System (Published)

RDF vocabularies • https: //ddialliance. org/Specification/RDF • XKOS - Extended Knowledge Organization System (Published) • leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems • XKOS extends SKOS for the needs of statistical classifications, in two main directions. • First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. • Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. • Disco - DDI-RDF Discovery Vocabulary (In development) • designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data • PHDD - Physical Data Description (On hold) • Due to related similar W 3 C development – e. g. CSV on the Web

Developing products DDI-CDI (Cross-Domain Integration) • DDI-CDI was expected to be the “core” of

Developing products DDI-CDI (Cross-Domain Integration) • DDI-CDI was expected to be the “core” of a model-driven DDI • A “next generation” after DDI-Lifecycle • Implementation cases showed that something else was needed: a focus on data provenance and data integration • DDI-CDI has emerged as a companion to DDI-Codebook and DDI-Lifecycle, not a replacement for them • The SBE community needs better data integration tools • So do other domains! Structured Data Transformation Language (SDTL) • SDTL is an independent language for representing data transformation commands in statistical analysis packages, such as SPSS, Stata, SAS, R, and Python.

DDI-CDI Goals and Purpose • Design goal: Create a useful, implementable product based on

DDI-CDI Goals and Purpose • Design goal: Create a useful, implementable product based on real use cases • Developed with modern systems in mind • That employs a variety of models • That complies with a range of specifications, related to data description and provenance • Fills in information that other standards do not capture (align rather than replace) • For data: Description of a single data point – a Datum – that can play different roles in different data structures and formats • For process/provenance: Packaging of machine level processes into a structure that relates to business processes described at a level understood by users.

DDI-CDI Functionality • https: //ddialliance. org/products/developing-products-of-the-alliance • https: //ddialliance. atlassian. net/wiki/spaces/DDI 4/pages/860815393/DDI+Cross+Domain+Integration+DDICDI+Review • Describe

DDI-CDI Functionality • https: //ddialliance. org/products/developing-products-of-the-alliance • https: //ddialliance. atlassian. net/wiki/spaces/DDI 4/pages/860815393/DDI+Cross+Domain+Integration+DDICDI+Review • Describe data formats: • • Rectangular/unit-record Long/event No-SQL/”big data” Multi-dimensional • Describe data provenance/process • Procedural process • Declarative process • Describe “foundational” metadata • Codes/categories/classifications • Concepts, variables, etc.

DDI-CDI Domain-Independence • Designed to be used by any domain • Focus on structure

DDI-CDI Domain-Independence • Designed to be used by any domain • Focus on structure and generic aspects of the things it describes • Generic elements like variables and classifications • Do not cover domain specific aspects like semantics or lifecycles (provided by domains) • Complementary to domain specific models, for example DDI-Lifecycle • Well suited to combining data from more than one domain or system (cross domain)