Wf 4 Ever Annotating research objects Stian SoilandReyes

  • Slides: 32
Download presentation
Wf 4 Ever: Annotating research objects Stian Soiland-Reyes, Sean Bechhofer my. Grid, University of

Wf 4 Ever: Annotating research objects Stian Soiland-Reyes, Sean Bechhofer my. Grid, University of Manchester Open Annotation Rollout, Manchester, 2013 -06 -24 This work is licensed under a Creative Commons Attribution 3. 0 Unported License

Motivation: Scientific workflows Coordinated execution of services and linked resources Dataflow between services Web

Motivation: Scientific workflows Coordinated execution of services and linked resources Dataflow between services Web services (SOAP, REST) Command line tools Scripts User interactions Components (nested workflows) http: //www. myexperiment. org/workflows/3355 http: //www. biovel. eu/ Method becomes: Documented visually Shareable as single definition Reusable with new inputs Repurposable other services Reproducible? http: //www. taverna. org. uk/

But workflows are complex machines • Will it still work after a year? 10

But workflows are complex machines • Will it still work after a year? 10 years? • Expanding components, we see a workflow involves a series of specific tools and services which • Depend on datasets, software libraries, other tools • Are often poorly described or understood • Over time evolve, change, break or are replaced • User interactions are not reproducible But can be tracked and replayed • Inputs Components Configuration Outputs http: //www. myexperiment. org/workflows/3355

Electronic Paper Not Enough Investigation Hypothesis Experiment Result Analysis Conclusions Publish Data Electronic paper

Electronic Paper Not Enough Investigation Hypothesis Experiment Result Analysis Conclusions Publish Data Electronic paper Data Open Research movement: Openly share the data of your experiments http: //figshare. com/ http: //datadryad. org/ http: //www. force 11. org/beyondthepdf 2

RESEARCH OBJECT (RO) http: //www. researchobject. org/ Research objects goal: Openly share everything about

RESEARCH OBJECT (RO) http: //www. researchobject. org/ Research objects goal: Openly share everything about your experiments, including how those things are related http: //www. researchobject. org/

What is in a research object? A Research Object bundles and relates digital resources

What is in a research object? A Research Object bundles and relates digital resources of a scientific experiment or investigation: Data used and results produced in experimental study Methods employed to produce and analyse that data Provenance and settings for the experiments People involved in the investigation Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object http: //www. researchobject. org/

Gathering everything Research Objects (RO) aggregate related resources, their provenance and annotations Conveys “everything

Gathering everything Research Objects (RO) aggregate related resources, their provenance and annotations Conveys “everything you need to know” about a study/experiment/analysis/dataset/workflow Shareable, evolvable, contributable, citable ROs have their own provenance and lifecycles

Why Research Objects? i. To share your research materials (RO as a social object)

Why Research Objects? i. To share your research materials (RO as a social object) ii. To facilitate reproducibility and reuse of methods iii. To be recognized and cited (even for constituent resources) iv. To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun)

A Research object http: //alpha. myexperiment. org/packs/387

A Research object http: //alpha. myexperiment. org/packs/387

Quality Assessment of a research object

Quality Assessment of a research object

Quality Monitoring

Quality Monitoring

Annotations in research objects Types: “This document contains an hypothesis” Relations: “These datasets are

Annotations in research objects Types: “This document contains an hypothesis” Relations: “These datasets are consumed by that tool” Provenance: “These results came from this workflow run” Descriptions: “Purpose of this step is to filter out invalid data” Comments: “This method looks useful, but how do I install it? ” Examples: “This is how you could use it”

Annotation guidelines – which properties? Descriptions: dct: title, dct: description, rdfs: comment, dct: publisher,

Annotation guidelines – which properties? Descriptions: dct: title, dct: description, rdfs: comment, dct: publisher, dct: license, dct: subject Provenance: dct: created, dct: creator, dct: modified, pav: provided. By, pav: authored. By, pav: contributed. By, roevo: was. Archived. By, pav: created. At Provenance relations: prov: was. Derived. From, prov: was. Revision. Of, wfprov: used. Input, wfprov: was. Output. From Social networking: oa: Tag, mediaont: has. Rating, roterms: technical. Contact, cito: is. Documented. By, cito: is. Cited. By Dependencies: dcterms: requires, roterms: requires. Hardware, roterms: requires. Software, roterms: requires. Dataset Typing: wfdesc: Workflow, wf 4 ever: Script, roterms: Hypothesis, roterms: Results, dct: Bibliographic. Resource

What is provenance? Attribution who did it? Activity what happens to it? Date and

What is provenance? Attribution who did it? Activity what happens to it? Date and tool when was it made? using what? Derivation how did it change? Origin where is it from? what is it part of? Attributes Annotations what is it? what do others say about it? Licensing can I use it? By Dr Stephen Dann licensed under Creative Commons Attribution-Share. Alike 2. 0 Generic http: //www. flickr. com/photos/stephendann/3375055368/ Aggregation

Attribution acted. On. Behalf. Of Who collected this sample? Who helped? The lab Alice

Attribution acted. On. Behalf. Of Who collected this sample? Who helped? The lab Alice Which lab performed the sequencing? Who did the data analysis ? Who curated the results? was. Attributed. To Who produced the raw data this analysis is based on? Data Who wrote the analysis workflow? Why do I need this? Roles Agent types i. To be recognized for my work ii. Who should I give credits to? Person Organization Software. Agent iii. Who should I complain to? iv. Can I trust them? v. Who should I make friends with? prov: was. Attributed. To prov: acted. On. Behalf. Of dct: creator dct: publisher pav: authored. By pav: contributed. By pav: curated. By pav: created. By pav: imported. By pav: provided. By. . . http: //practicalprovenance. wordpress. com/ 16

Derivation Sample Which sample was this metagenome sequenced from? was. Derived. From Which meta-genomes

Derivation Sample Which sample was this metagenome sequenced from? was. Derived. From Which meta-genomes was this sequence extracted from? Which sequence was the basis for the results? Meta genome What is the previous revision of the new results? Why do I need this? i. ii. Sequence To verify consistency (did I use the correct sequence? ) was. Influenced. By To find the latest revision iii. To backtrack where a diversion appeared after a change iv. To credit work I depend on v. was. Quoted. From Auditing and defence for peer review was. Derived. From Old results was. Revision. Of New results

Activities Lab technician Sample Alice had. Role What happened? When? Who? What was used

Activities Lab technician Sample Alice had. Role What happened? When? Who? What was used and generated? Why was this workflow started? Which workflow ran? Where? used Sequencing was. Generated. By Metagenome Why do I need this? i. To see which analysis was performed ii. To find out who did what iii. What was the metagenome used for? iv. To understand the whole process “make me a Methods section” v. To track down inconsistencies was. Associated. With was. Started. By "2012 -06 -21" was. Started. At was. Informed. By Workflow server was. Associated. With Workflow run had. Plan was. Generated. By Results Workflow definition

PROV model Provenance Working Group Copyright © 2013 W 3 C® (MIT, ERCIM, Keio,

PROV model Provenance Working Group Copyright © 2013 W 3 C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. http: //www. w 3. org/TR/prov-primer/

Provenance of what? Who made the (content of) research object? Who maintains it? Who

Provenance of what? Who made the (content of) research object? Who maintains it? Who wrote this document? Who uploaded it? Which CSV was this Excel file imported from? Who wrote this description? When? How did we get it? What is the state of this RO? (Live or Published? ) What did the research object look like before? (Revisions) – are there newer versions? Which research objects are derived from this RO?

Research object model at a glance «ore: Aggregated. Resource» «ro: Resource» Resource oa: has.

Research object model at a glance «ore: Aggregated. Resource» «ro: Resource» Resource oa: has. Target «ore: Aggregation» «ro: Research. Object» ore: aggregates Research Object ore: i s. Desc ribed «oa: Annotation» «ro: Aggregated. Annotation» Annotation oa: has. Body «trig: Graph» Resource «ore: Resource. Map» «ro: Manifest» Annotation graph By Manifest

Wf 4 Ever architecture If RDF, import as named graph External reference Blob store

Wf 4 Ever architecture If RDF, import as named graph External reference Blob store Redirects to Graph store Uploaded to ORE Proxy Resource quer ie s Redirects to Annotation graph Manifest Research object REST resources http: //www. wf 4 ever-project. org/wiki/display/docs/RO+API+6 Annotation SPARQL

Where do RO annotations come from? Imported from uploaded resources, e. g. embedded in

Where do RO annotations come from? Imported from uploaded resources, e. g. embedded in workflow -specific format (creator: unknown!) Created by users filling in Title, Description etc. on website By automatically invoked software agents, e. g. : A workflow transformation service extracts the workflow structure as RDF from the native workflow format Provenance trace from a workflow run, which describes the origin of aggregated output files in the research object

How we are using the OA model Multiple oa: Annotation contained within the manifest

How we are using the OA model Multiple oa: Annotation contained within the manifest RDF and aggregated by the RO. Provenance (PAV, PROV) on oa: Annotation (who made the link) and body resource (who stated it) Typically a single oa: has. Target, either the RO or an aggregated resource. oa: has. Body to a trig: Graph resource (read: RDF file) with the “actual” annotation as RDF: dct: title "The wonderful workflow". Multiple oa: has. Target for relationships, e. g. graph body: roterms: input. Selected .

What should we also be using? Motivations my. Experiment: commenting, describing, moderating, questioning, replying,

What should we also be using? Motivations my. Experiment: commenting, describing, moderating, questioning, replying, tagging – made our own vocabulary as OA did not exist Selectors on compound resources E. g. description on processors within a workflow definition. How do you find this if you only know the workflow definition file? Currently: Annotations on separate URIs for each component, described in workflow structure graph, which is body of annotation targeting the workflow definition file Importing/referring to annotations from other OA systems (how to discover those? )

What is the benefit of OA for us? Existing vocabulary – no need for

What is the benefit of OA for us? Existing vocabulary – no need for our project to try to specify and agree on our own way of tracking annotations. Potential interoperability with third-party annotation tools E. g. We want to annotate a figure in a paper and relate it to a dataset in a research object – don’t want to write another tool for that! Existing annotations (pre research object) in Taverna and my. Experiment map easily to OA model

History lesson (AO/OAC/OA) When forming the Wf 4 Ever Research Object model, we found:

History lesson (AO/OAC/OA) When forming the Wf 4 Ever Research Object model, we found: Open Annotation Collaboration (OAC) Annotation Ontology (AO) What was the difference? Technically, for Wf 4 Ever’s purposes: They are equivalent Political choice: AO – supported by Utopia (Manchester) We encouraged the formation of W 3 C Open Annotation Community Group and a joint model Next: Research Object model v 0. 2 and RO Bundle will use the OA model – since we only used 2 properties, mapping is 1: 1 http: //www. wf 4 ever-project. org/wiki/display/docs/2011 -09 -26+Annotation+model+considerations

Saving a research object: RO bundle Single, transferrable research object Self-contained snapshot Which files

Saving a research object: RO bundle Single, transferrable research object Self-contained snapshot Which files in ZIP, which are URIs? (Up to user/application) Regular ZIP file, explored and unpacked with standard tools JSON manifest is programmatically accessible without RDF understanding Works offline and in desktop applications – no REST API access required Basis for RO-enabled file formats, e. g. Taverna run bundle Exchanged with my. Experiment and RO tools

Workflow Results Bundle URI references ZIP folder structure (RO Bundle). ro/manifest. json de/def 2

Workflow Results Bundle URI references ZIP folder structure (RO Bundle). ro/manifest. json de/def 2 e 58 b-50 e 2 -4949 -9980 -fd 310166621 a. txt intermediates/ workflowrun. prov. ttl (RDF) execution environment output. A. txt Aggregating in Research Object output. B/ 1. txt 2. txt 3. txt attribution output. C. jpg workflow input. A. txt mimetype application/vnd. wf 4 ever. robundle+zip https: //w 3 id. org/bundl

RO Bundle JSON-LD context RDF . ro/manifest. json Who made the RO? When? http:

RO Bundle JSON-LD context RDF . ro/manifest. json Who made the RO? When? http: //orcid. org/ RO provenance What is aggregated? File In ZIP or external URI Format Who? http: //json-ld. org/ External URIs placed in folders Embedded annotation External annotation, e. g. blogpost Note: JSON "quotes" not shown above for brevity https: //w 3 id. org/bundle

Research Object as RDFa http: //www. oeg-upm. net/files/dgarijo/motif. Analysis. Site/ <body resource=

Research Object as RDFa http: //www. oeg-upm. net/files/dgarijo/motif. Analysis. Site/ Common Motifs in Scientific Workflows: An Empirical Analysis

  • Analytics for Taverna workflows
  • Analytics for Wings workflows
  • http: //mayor 2. dia. fi. upm. es/oeg-upm/files/dgarijo/motif. Analysis. Site/

    W 3 C community group for RO http: //www. w 3. org/community/rosc/

    W 3 C community group for RO http: //www. w 3. org/community/rosc/