Provenance of scientific information as experienced in DRIVER

  • Slides: 22
Download presentation
Provenance of scientific information as experienced in DRIVER 6 th e-Infrastructure Concertation Event Lyon,

Provenance of scientific information as experienced in DRIVER 6 th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld University / DRIVER

Notions of Provenance • Where do data objects* originate from? – Scientific Work --

Notions of Provenance • Where do data objects* originate from? – Scientific Work -- examples • Instrumentation techniques – Manufacturers of hard- and software • Methodologies – Processes, e. g. gene sequencing – Technical/Local -- examples • (web)-identifiers • Database, repository name * Primary data, documents, metadata …

Why Provenance? • Quoting / Citing / Referencing as global scientific principle – „Reproducible

Why Provenance? • Quoting / Citing / Referencing as global scientific principle – „Reproducible research“ • Giving credits to authors / creators in distributed environments • Original location / context has to be known • Experienced in Grid-Environments [1]

Provenance & Interoperability • Re-Use / Sharing: “Addressing/Accessing” – Common view, common use –

Provenance & Interoperability • Re-Use / Sharing: “Addressing/Accessing” – Common view, common use – Unidirectional: No change of data objects! • Federation: “Discovering in Context” – Remote representation of distributed DOs • Aggregation: “Contextualizing” – Add unchanged object in a context • Processing/Annotation: “Changing” – Uni- vs. Bidirectional: Change of DOs and remote representation vs. back-storage (e. g. CVS)

Scenarios in DRIVER

Scenarios in DRIVER

Digital Scientific Data

Digital Scientific Data

Digital Object Collections ⊃ ⊃

Digital Object Collections ⊃ ⊃

Digital Object Repositories + + = + +

Digital Object Repositories + + = + +

Digital Information Space

Digital Information Space

Conventional Web Data

Conventional Web Data

„Simple“ Applications

„Simple“ Applications

Metadata Infrastructure

Metadata Infrastructure

Basic Provenance Settings • Indicate Production Situation – Metadata • Author, Instrumentation etc. •

Basic Provenance Settings • Indicate Production Situation – Metadata • Author, Instrumentation etc. • Remote Representation – Indicate place of origin in remote systems • Metadata as digital objects / first order citizens – Allow lineage respresentation • Credits in remote environments / versioning

Orders of Provenance • 1 st order: Metadata – Provenance attached to data –

Orders of Provenance • 1 st order: Metadata – Provenance attached to data – Minimal „knowledge“ required in application – Allow remote handling of data objects – Require metadata infrastructure – Metadata introduce 2 objects: requires linkage • 2 nd order: context / compounds – Express multiple relations between objects – May introduce semantic model

Provenance in DRIVER #1 • Simple Objects: OAI-PMH [2] – 1 st order provenance

Provenance in DRIVER #1 • Simple Objects: OAI-PMH [2] – 1 st order provenance • Metadata: minimum OAI-DC – 2 nd order provenance • DRIVER explicit identifiers for repositories • OAI-PMH: inline representation („about“)

Semantic/Compound Data

Semantic/Compound Data

„Semantic“ Applications

„Semantic“ Applications

Provenance in DRIVER #2 • „Enhanced Publications“ – Research project in DRIVER-II – Representation

Provenance in DRIVER #2 • „Enhanced Publications“ – Research project in DRIVER-II – Representation of data /document packages – Use of OAI-ORE

Provenance in OAI-ORE • OAI-ORE: Object Re-Use and Exchange[4] – Uses Resource Maps <

Provenance in OAI-ORE • OAI-ORE: Object Re-Use and Exchange[4] – Uses Resource Maps < Named Graphs – Uses „lineage“ to represent expl. Provenance – Future: explicit provenance model [7] ?

Summary • Provenance essential for … – Indicating origin in distributed data spaces •

Summary • Provenance essential for … – Indicating origin in distributed data spaces • Accessing / Addressing • Federation / Aggregation • Processing / Annotation – Document and data citation / trace-back – 1 st order: describing data > metadata – 2 nd order: describing context > semantic data

Lessons learnt in DRIVER • Use web-enabled Identification (URI/UDDI etc. ) – „Dark“ databases

Lessons learnt in DRIVER • Use web-enabled Identification (URI/UDDI etc. ) – „Dark“ databases don‘t interoperate • 1 st order provenance at place of origin – Requires metadata to describe origin – Enables a metadata infrastructure – Introduces linkage problem • 2 nd order provenance in contexts – Requires data provider identification in federators / aggregators in order to link back – May require semantic model for context – Would benefit from a semantic infrastructure

Resources [1] On provenance in the e. Science / grid-environment – http: //www. sigmod.

Resources [1] On provenance in the e. Science / grid-environment – http: //www. sigmod. org/sigmod/record/issues/0509/p 31 -special-sw-section-5. pdf – In GLITE • http: //www. cesnet. cz/doc/techzpravy/2007/glite-job-provenance/ • http: //twiki. ipaw. info/bin/view/Challenge [2] On provenance in OAI-PMH – http: //www. openarchives. org/OAI/2. 0/guidelines-provenance. htm [3] On provenance OAI-ORE (referred to as ore: lineage) – http: //www. openarchives. org/ore/meetings/Soton/ore_beyond_basics. pdf (general) – http: //www. openarchives. org/ore/1. 0/vocabulary (definition) [4] Named Graphs, Provenance and Trust (Caroll et al. ) – http: //www 4. wiwiss. fu-berlin. de/bizer/SWTSGuide/carroll-ISWC 2004. pdf [5] W 3 C: On provenance in RDF – http: //www. w 3. org/2001/12/attributions/ [6] Open Provenance Model – http: //eprints. ecs. soton. ac. uk/14979/1/opm. pdf [7] DRIVER: Digital Repository Infrastructure for European Research – http: //www. driver-community. eu