Managing the Metadata Lifecycle The Future of DDI

  • Slides: 24
Download presentation
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda,

Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow, GESIS Wolfgang Zenk-Möltgen, GESIS

Research Data Life Cycle Archiving Concept Collection Processing Distribution Discovery Repurposing Analysis

Research Data Life Cycle Archiving Concept Collection Processing Distribution Discovery Repurposing Analysis

Current Uses of DDI • DDI 2 used for many different purposes by many

Current Uses of DDI • DDI 2 used for many different purposes by many different archival institutions, e. g. , metadata records for data catalogs, export to Web-based information systems such as Nesstar, long-term preservation, and PDF codebooks • GESIS and ICPSR are developing procedures and systems to extend use of DDI in their institutions

DDI 3 Expands in Scope • To date use mainly limited to Distribution and

DDI 3 Expands in Scope • To date use mainly limited to Distribution and Archiving stages of data life cycle • DDI 3 enables use of new elements and structures to extend markup to other stages of the life cycle - both earlier and later • Emphasis is on projects and tasks already in process at each institution

DDI 3 Use at GESIS • • Structured Comments – Processing Translation of EVS

DDI 3 Use at GESIS • • Structured Comments – Processing Translation of EVS Questionnaire – Collection Supporting Enhanced Publications – Analysis Continuity Guides: Trends by Concepts – Concept, Discovery, Repurposing

Extracting structured information in current workflow • Example: building derived variables by SPSS •

Extracting structured information in current workflow • Example: building derived variables by SPSS • SPSS setups contain commands and comments • Necessary steps for using SPSS setups as information source for DDI – Improving comments for automated extraction • formalize layout • add keywords from a list – Extraction of structured comments and related commands by custom tool. – Transformation of this information into DDI 3 fragments

Extracting structured information in current workflow ***v* Variables/Derived. Variables * DESCRIPTION * This section

Extracting structured information in current workflow ***v* Variables/Derived. Variables * DESCRIPTION * This section is on derived variables; ***v* Derived. Variables/w 101_new * NAME * w 101_new * DESCRIPTION * w 101_new is a derived variable from w 101; * It has the original value from w 101 * when w 102 is equal 1 * otherwise it has the value 5; * USED VARIABLES * w 101, w 102 * SOURCE **. compute w 101_new = 5. if ( w 102 = 1 ) w 101_new = w 101. ** * VERSION * 2009 -04 -18 * AUTHOR * Achim Wackerow * EMAIL * joachim. wackerow@gesis. org ***. Report (HTML) Extractor DDI 3 fragments Generation. Instruction Description Command SPSS Result

Translation of EVS Questionnaire DSDM http: //zacat. gesis. org

Translation of EVS Questionnaire DSDM http: //zacat. gesis. org

Supporting Enhanced Publications DDI Alliance Publications with References to Data: ddi. e d. DDI

Supporting Enhanced Publications DDI Alliance Publications with References to Data: ddi. e d. DDI 3. 1 URN contains: sis e g Agency e g a ress Object d find d er a v l Version o es rn retu Publication with References (URNs) r http: //resolve. gesis. org find object return URL requ est d o http: //www. gesis. org/docxyz cum retu rn d ocum ent URL of Documentatio n and/or Data <urn: ddi: 3_1: Variable. Scheme. Variable=gesis. de. ddi: ZA 3811_Var. Sch(1_0). V 8(1_0)>

Supporting Enhanced Publications DSDM DDI 3 EPE Simple Export Wizard 1. 2. 0

Supporting Enhanced Publications DSDM DDI 3 EPE Simple Export Wizard 1. 2. 0

Grouping Trends • Continuity guides in different contexts – Synoptical question / variable lists

Grouping Trends • Continuity guides in different contexts – Synoptical question / variable lists – Documentation of changes in question wording / answer scales • Systematic organization by conceptual categories – Codebook. Exlorer tool (relational DB) – Publication as html links on variable level in ZACAT • Taking advantage of DDI 3 in the future – Defining the standard and comparison – Qualifying relations (e. g. q-text modified, scale modified, …)

Continuity guides Literal question text over time Conceptual categories Deviations in answer categories

Continuity guides Literal question text over time Conceptual categories Deviations in answer categories

Trends by concepts Trend variables by study Conceptual categories Country 1 Country 2

Trends by concepts Trend variables by study Conceptual categories Country 1 Country 2

DDI 3 RESOURCE „Ex-post Standard“ Universe Concept Comparison map Ø Equivalency Ø Relationship Ø

DDI 3 RESOURCE „Ex-post Standard“ Universe Concept Comparison map Ø Equivalency Ø Relationship Ø Description Data Collection <dc: Question. Scheme id="QS"> <dc: Question. Item id="Q"> <dc: Question. Text> <dc: Literal. Text> <dc: Text>Do you …? </dc: Text> </dc: Literal. Text> … <dc: Code. Domain> <r: Code. Scheme. Reference> <r: ID>CODS 1</r: ID> </r: Code. Scheme. Reference> Logical Product <l: Category. Scheme id="CATS 1"> <l: Category id="Cat 1"> <r: Label>often</r: Label> … <l: Code. Scheme id="CODS 1"> <l: Category. Scheme. Reference> <r: ID>CATS 1</r: ID> </l: Category. Scheme. Reference> <l: Code is. Discrete="true"> <l: Category. Reference> <r: ID>Cat 1</r: ID> </l: Category. Reference> <l: Value>1</l: Value> </l: Code> … Questio ntext <>modif ied<> STUDY UNIT 1 … n Data. Collection <dc: Question. Scheme id="QS"> <dc: Question. Item id="Qn"> … <dc: Text>Have you …? </dc: Text> … Logical. Product Label <>identical<> Values <>different>> <>generation instruction<> <>scale reversed<> <l: Category. Scheme id="CATS 1"> <l: Category id="Cat 1"> <r: Label>often</r: Label> … <l: Code. Scheme id="CODS 1"> … <l: Code is. Discrete="true"> <l: Category. Reference> <r: ID>Cat 1</r: ID> </l: Category. Reference> <l: Value>4</l: Value> </l: Code> … GROUP STUDY UNIT 8 -14 Data. Collection … GROUP Logical. Product STUDY UNIT 15 -x … Data. Collection … Logical. Product …

DDI 3 Use at ICPSR • Information collected from data producers in precollection phase

DDI 3 Use at ICPSR • Information collected from data producers in precollection phase – Concept • Metadata output from CAI applications – Data Collection • Processor‘s dashboard – Data Processing • Metadata mining: New faceted search tool to facilitate discovery through more precise searching – Data Discovery • Relational database for comparison and harmonization across studies – Repurposing

SMDS Metadata Modules

SMDS Metadata Modules

OAIS SIP AIP DDI - An easy roundtrip should be possible between the core

OAIS SIP AIP DDI - An easy roundtrip should be possible between the core structure and the AIP. - The purpose of the AIP is comparable to PDF/A where all fonts are included. as backbone forisstructured metadata - The core structure headed to efficient processing Archive and reuse of metadata. Data / Documents outside of DDI DIP Distribution Packages Web information system Search engines. Distribution Statistical packages Online Analysis. Discovery Analysis Repurposing - A combination of this information forms a The structured metadata combined traditional SIP. with Concept - An AIPCollection Processing mustdata be specially built, because the metadata forms the core -of. Information the archive. from each life cycle stage can include just to other reused metadata. - Itreferences would be organised in a way where sent to the archive - can be understood as - An AIP should includecan everything of one DDI can metadata be reused and study, information dynamic SIP. can be also the main structure of the AIP. Data can be Custom Tools CAI Tools Information extracted be ingested and distributed infrom a dynamic Self-archiving by web forms can be offered inline in DDI. An AIP would exist beside the core (e. g. Forms-based) MQDS etc. SPSS etc. way. for the different stages. structure in the archive.

DDI-based archive as collection of reusable components • • Metadata in DDI is structured

DDI-based archive as collection of reusable components • • Metadata in DDI is structured in small items which can be identified and maintained by one or more institutions These parts can be – the basis for comparison and metadata mining (discovery of new relationships) – a candidate for reuse in other studies or new studies (like standard questions or variables) Study 1 Study-specific information Items for reuse New study Repository of reusable components - Standard concepts - Standard questions - Standard variables - Harmonized information - Controlled vocabularies

Issues for Discussion • Advantages and disadvantages of seeking to capture additional metadata throughout

Issues for Discussion • Advantages and disadvantages of seeking to capture additional metadata throughout the data life cycle • How much information to make available to funding agencies, data producers, and secondary users? • Rules for structured documentation and delivery of items to archives for preservation • An overall DDI tool to capture and curate all metadata and data – the Holy Grail? ? ?