RWS Provenance Experiments in Kepler Kepler PR RWS

  • Slides: 24
Download presentation
RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert Podhorszki Ilkay Altintas

RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert Podhorszki Ilkay Altintas Bertram Ludaescher in collaboration with Shawn Bowers Timothy Mc. Phillips Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Initial Provenance Framework (IPAW’ 06, Altintas et al. ) • Vision: – Modeled as

Initial Provenance Framework (IPAW’ 06, Altintas et al. ) • Vision: – Modeled as a separate concern in the system • Optional drag and drop feature – Listen to execution and save information (customizable): • Context: who, what, where, when, and why that is associated with the run • Input data and its associated metadata • Workflow outputs and intermediate data products • Workflow definition (entities, parameters, connections): a specification of what exists in the workflow and can have a context of its own • Information about the workflow evolution -- workflow trail Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Kepler Object Manager SMS Actor&Data

Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Kepler Object Manager SMS Actor&Data SEARCH Kepler Core Extensions Provenance Challenge @ GGF 18 Type System Ext Documentation Smart Re-run / Failure Recovery Provenance Recorder Ptolemy Kepler+PF+RWS, Podhorszki, Altintas et al. IPAW’ 06 -Altintas et al.

Kepler Provenance Recorder (IPAW’ 06, Altintas et al) • Parametric and customizable – Different

Kepler Provenance Recorder (IPAW’ 06, Altintas et al) • Parametric and customizable – Different report formats – Variable levels of verbosity • all, some, medium, on error – Multiple cache destinations • Saves information on – User name, Date, Run, etc… Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Read-Write-Re. Set Model (IPAW’ 06, Mc. Phillips et al) r…r w…w A 3 [s!]

Read-Write-Re. Set Model (IPAW’ 06, Mc. Phillips et al) r…r w…w A 3 [s!] PS ? ? ? • r, r …. r, w, w, … w, r, … r, w, . . . w, … firing • what about actor state? what about “real” dependencies? • reset event s defines when actor “cuts off” dependencies – a semantic notion, known to the actor [developer] (or part of a higherorder scheme) • r, r …. r, w, w, … w, [s!] r, … r, w, . . . w, … Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Goals of the PR+RWS Experiments • Use the RWS model for Kepler workflows –

Goals of the PR+RWS Experiments • Use the RWS model for Kepler workflows – both single-level and nested workflows (fun starts here : -) • Extend the Kepler Provenance Recorder – Modify the methods of the provenance listener class – Classes to store execution data about the workflow • To generate the send-receive relations of the tokens correctly • To count actor firings correctly • Disclaimer: Initially one workflow run is targeted – (but approach can handle multiple actor firings due to pipeline parallelism. . ) – future: queries over several runs and workflow-provenance – (others in Kepler already doing this merge efforts in the future) Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Implementation: Data Model • Port-actor relationship – port. Table(Port, Actor, type) • type is

Implementation: Data Model • Port-actor relationship – port. Table(Port, Actor, type) • type is r as real and v as virtual (transparent) • Token-object relationship – token. Table(Token, Object) • Object-value relationship – object. Table(Object, Value, Type) • type is currently not recorded • RWS trace – trace. Table(Port, Event, Token, Firing. Counter) • event: r as read, w as write or s as state-reset Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Implementation: Class Hierarchy • Extends the existing provenance execution listener with – Methods –

Implementation: Class Hierarchy • Extends the existing provenance execution listener with – Methods – More event listeners – Supporting classes • RWSPort. Info, RWSActor. Info – Data structures for building and containing info about the workflow (and counters for event record • RWSEvent – Handles RWS events Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Execution: Initialization phase Generate RWS port. Map for each port initialize() Generate RWS actor.

Execution: Initialization phase Generate RWS port. Map for each port initialize() Generate RWS actor. Map Create new RWS event list RWSPort. Info (build connection info) for each actor Record static wf info Provenance Challenge @ GGF 18 RWSPort. Info (info locally known at a port) Kepler+PF+RWS, Podhorszki, Altintas et al. RWSActor. Info port. Table

Execution: Event Handling and Modifications Just before run Subscribe to token listeners validate() Before

Execution: Event Handling and Modifications Just before run Subscribe to token listeners validate() Before model is executed. Token. Send Token. Get event handling methods are extended here When the workflow is modified change. Executed() Sth is changed in the workflow Provenance Challenge @ GGF 18 Re-generate RWS port. Map Kepler+PF+RWS, Podhorszki, Altintas et al.

Execution: During the workflow run When a token event occurs New RWS event w

Execution: During the workflow run When a token event occurs New RWS event w Token. Send. Event() token. Table For each connected transparent port Print sent token’s info (token id, object id, value) object. Table Generate virtual Token. Get event trace. Table New RWS event r Token. Get. Event() If it is a transparent port Provenance Challenge @ GGF 18 Generate virtual Token. Send event Kepler+PF+RWS, Podhorszki, Altintas et al.

A Kepler Workflow Implementation Table port. Table token. Table object. Table trace. Table Provenance

A Kepler Workflow Implementation Table port. Table token. Table object. Table trace. Table Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al. RWS TRACE # of elements size in KB 81 4 30 2 30 3 86 6

Query 1. a Find the process that led to Atlas X Graphic / everything

Query 1. a Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Answer a. list of actors that contributed to the result: (21 actors). They appear in reversed order as they were executed. ? - q 1 b_actors('"/usr/home/pnorbert/Provenance/Prov. Ch/data/output/atlas-x. gif"', Actor. List), print(Actor. List). [. pc. Convert_x, . pc. Slicer_x, . pc. Soft. Mean, . pc. Reslice 2, . pc. Reslice 4, . pc. Reslice 1, . pc. Ref. Img, . pc. Ref. Hdr, . pc. Input. Hdr 3, . pc. Align. Warp 2, . pc. Input. Hdr 2, . pc. Input. Img 2, . pc. Input. Hdr 4, . pc. Input. Img 4, . pc. Align. Warp 1, . pc. Input. Hdr 1 ] Provenance Challenge @ GGF 18 . pc. Reslice 3, . pc. Align. Warp 3, . pc. Input. Img 3, . pc. Align. Warp 4, . pc. Input. Img 1, Kepler+PF+RWS, Podhorszki, Altintas et al.

Query 1. b Answer b. list of intermediate values created by the workflow (26

Query 1. b Answer b. list of intermediate values created by the workflow (26 values). ? - q 1 b_values('"/usr/home/pnorbert/Provenance/Prov. Ch/data/output/atlas-x. gif"', Value. List), print(Value. List). ["/usr/home/pnorbert/Provenance/Prov. Ch/data/output/atlas-x. gif", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 4/atlas-x. pgm", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 3/atlas. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 3/atlas. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 3. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 2. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 4. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 1. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 2. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 3. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 4. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 1. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 1/warp 3. warp", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/reference. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/reference. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 3. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 1/warp 2. warp", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 2. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 2. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 1/warp 4. warp", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 4. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 4. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 1/warp 1. warp", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 1. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 1. hdr” ] Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Improved PC workflow (cf. COMAD wf) • • A more generic workflow to accepts

Improved PC workflow (cf. COMAD wf) • • A more generic workflow to accepts any number of images Smaller number of actors This effects the number of values as it requires additional array operations cf. also COMAD approach and Taverna approach (but we fire Align. Wrap individually here) Provenance Challenge @ GGF 18 Table port. Table token. Table object. Table trace. Table RWS TRACE # of elements size in KB 42 2 51 3 39 4 150 9 Kepler+PF+RWS, Podhorszki, Altintas et al.

Improved PC workflow Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Improved PC workflow Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Query 1 Find the process that led to Atlas X Graphic / everything that

Query 1 Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Answer a. list of actors that contributed to the result: (15 actors). They appear in reversed order as they were executed. ? - q 1 b_actors('"/usr/home/pnorbert/Provenance/Prov. Ch/data/output/atlas-x. gif"', Actor. List), print(Actor. List). [. pca. Convert, . pca. imgrepeat, . pca. Reslice, . pca. Input. Img, . pca. Slicer , . pca. hdrrepeat, . pca. Soft. Mean. Array, . pca. imgarray, . pca. Align. Warp, . pca. Ref. Hdr, . pca. Ref. Img, . pca. Ramp . pca. seq. XYZ, . pca. hdrarray, . pca. Input. Hdr, ] Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Query 1 Answer b. list of intermediate values created by the workflow (33 values).

Query 1 Answer b. list of intermediate values created by the workflow (33 values). It includes internal data values (arrays) additionally to the original file names. ? - q 1 b_values('"/usr/home/pnorbert/Provenance/Prov. Ch/data/output/atlas-x. gif"', Value. List), print(Value. List). [ "/usr/home/pnorbert/Provenance/Prov. Ch/data/output/atlas-x. gif", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 4/atlas-x. pgm", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 3/atlas. hdr", "x", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 3/atlas. img", { "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 1. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 2. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 3. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 4. img" }, { "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 1. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 2. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 3. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 4. hdr" }, "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 1. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 2. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 3. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 2/resliced 4. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/out-stage 1/warp 1. warp", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/reference. hdr", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/anatomy 1. img", "/usr/home/pnorbert/Provenance/Prov. Ch/data/input/reference. img", 1, etc. . . Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Nested workflow tricky example S Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et

Nested workflow tricky example S Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

The trick • Multi-port of Ptolemy – – – two distinct channels going into

The trick • Multi-port of Ptolemy – – – two distinct channels going into S and out from S A’s output is delivered to S. C B’s output is delivered to S. D S. C’s output is delivered to E S. D’s output is delivered to F Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Lineage of actors and values Who contributed to value C. 1 arrived at E?

Lineage of actors and values Who contributed to value C. 1 arrived at E? ? - q 1('"C. 1"', Actor. List, Value. List). Actor. List = ['. WF 15. S. C', '. WF 15. S', '. WF 15. A'] Value. List = ['"C. 1"', '1'] Who contributed to value D. 2 arrived at F? ? - q 1('"D. 2"', Actor. List, Value. List). Actor. List = ['. WF 15. S. D', '. WF 15. S', '. WF 15. B'] Value. List = ['"D. 2"', '2'] Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Single-level lineage of actors and values Who contributed to value C. 1 arrived at

Single-level lineage of actors and values Who contributed to value C. 1 arrived at E? ? - q 1 b('"C. 1"', Actor. List, Value. List). Actor. List = ['. WF 15. S', '. WF 15. A'] Value. List = ['"C. 1"', '1'] Who contributed to value D. 2 arrived at F? ? - q 1 b('"D. 2"', Actor. List, Value. List). Actor. List = ['. WF 15. S', '. WF 15. B'] Value. List = ['"D. 2"', '2'] Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Conclusions • 1 st attempt combining Kepler PR & Kepler RWS provenance model –

Conclusions • 1 st attempt combining Kepler PR & Kepler RWS provenance model – Both published in IPAW 2006 • Query 1 was successfully answered. • Queries 2 and 3 are answerable, but hadn’t been implemented yet. • Queries on multiple runs and workflow design provenance is out of the scope of this initial prototype. – Other groups in Kepler focusing on this. Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.

Some related references • Provenance Framework/Recorder: – Provenance Collection Support in the Kepler Scientific

Some related references • Provenance Framework/Recorder: – Provenance Collection Support in the Kepler Scientific Workflow System, I. Altintas, O. Barney, E. Jaeger-Frank, IPAW 2006, Chicago, Illinois, May 2006. • RWS Model: – A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows, Shawn Bowers, Timothy Mc. Phillips, Bertram Ludaescher, Shirley Cohen, Susan B. Davidson. International Provenance and Annotation Workshop (IPAW'06), Chicago, Illinois, USA, May 3 -5, 2006. Provenance Challenge @ GGF 18 Kepler+PF+RWS, Podhorszki, Altintas et al.