Karma Provenance Framework v 2 Provenance Challenge WorkshopGGF

  • Slides: 21
Download presentation
Karma Provenance Framework v 2 Provenance Challenge Workshop/GGF 18 Yogesh L. Simmhan Beth Plale,

Karma Provenance Framework v 2 Provenance Challenge Workshop/GGF 18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University 2006 -09 -13

Outline p Architecture of Karma p Workflow Setup & Collecting Provenance p Provenance Traces

Outline p Architecture of Karma p Workflow Setup & Collecting Provenance p Provenance Traces p “canonical” Challenge Queries p Suggested Variations 2006 -09 -13 2

Provenance Collection: Challenges & Uses p Linked Environments for Atmospheric Discovery (LEAD) project n

Provenance Collection: Challenges & Uses p Linked Environments for Atmospheric Discovery (LEAD) project n p p p Weather & Severe Storm Prediction Applications Provenance on workflow (process) & data products at fine granularity Dynamic, Long running workflows Helps scientists to search for workflows & data products, Track workflow execution, Analyze & mine data products from runs 2006 -09 -13 3

Karma Provenance Framework p Lightweight – do not duplicate existing metadata cataloging effort n

Karma Provenance Framework p Lightweight – do not duplicate existing metadata cataloging effort n n my. LEAD personal metadata catalog Res. Cat service & data registry Glue to integrate metadata on data & services with runtime workflow information 1 p Scalability – 500 users, 100’s of workflows, 10, 000’s of data products p [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y. , et al. ; IPAW, 2006 -09 -13 4

Karma Architecture 2 Query for Workflow, Process, & Data Provenance Karma Provenance Service Provenance

Karma Architecture 2 Query for Workflow, Process, & Data Provenance Karma Provenance Service Provenance Listener Subscribe & Listen to Activity Notifications Activity DB Message Bus WS-Eventing Service API Workflow–Started & –Finished Activities Publish Provenance Activities as Notifications 10 C Provenance Browser Client Provenance Query API Application–Started & –Finished, Data–Produced & –Consumed Activities Workflow Engine Service 1 Service 2 10 P/10 C Service 9 … 10 P 10 C WS-Messenger Notification Broker Service 10 10 P/10 C Workflow Instance 10 Data Products Consumed & Produced by each Service [2] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y. , et al. , Submitted to ICWS Conference, 2006 -09 -13 5

Provenance Challenge Workflow p Applications modeled as web-services n n n p GFac toolkit

Provenance Challenge Workflow p Applications modeled as web-services n n n p GFac toolkit creates service for command-line applications Service invokes a shell-script wrapper of the application, passing command-line arguments Created services automatically instrumented to generate provenance using Karma client library Workflow composed as GPEL* script n n XBaya Workflow composer GUI Central GPEL workflow engine orchestrates execution *Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL) 2006 -09 -13 6

Provenance Challenge Workflow 2006 -09 -13 7

Provenance Challenge Workflow 2006 -09 -13 7

Provenance Traces p Data Provenance: get[Recursive]Data. Provenance n n What (ID), where (URL), when

Provenance Traces p Data Provenance: get[Recursive]Data. Provenance n n What (ID), where (URL), when (Timestamp) How (Process, inputs) 2006 -09 -13 8

Provenance Traces p Process Provenance: n n n get. Process. Provenance What (ID), when

Provenance Traces p Process Provenance: n n n get. Process. Provenance What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Input & Output data products 2006 -09 -13 9

Provenance Traces p Workflow Trace: n n n get. Workflow. Trace What (ID), when

Provenance Traces p Workflow Trace: n n n get. Workflow. Trace What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Process provenance of workflow steps 2006 -09 -13 10

2006 -09 -13 11

2006 -09 -13 11

Provenance Challenge Queries p p ! Answered by Karma Service API Directly Answered by

Provenance Challenge Queries p p ! Answered by Karma Service API Directly Answered by Karma Service API, with post-processing by client ~ Answered by access to backend DB (SQL) Not answered Query 1 2 Result ! 2006 -09 -13 3 4 5 6 7 8 9 ! ~ ~ ~ ~ 12

Provenance Challenge Queries: Q 1 p p Find everything that caused Atlas X Graphic

Provenance Challenge Queries: Q 1 p p Find everything that caused Atlas X Graphic to be as it is ! Answered by Karma Service API Directly This is the recursive data provenance of the Atlas X Graphic file A call to get. Recursive. Data. Provenance( ‘lead: uuid: 1157946992 -atlas-x. gif’) returns this [www] 2006 -09 -13 13

Provenance Challenge Queries: Q 2 p p 1. 2. Find the process that led

Provenance Challenge Queries: Q 2 p p 1. 2. Find the process that led to Atlas X Graphic, excluding all prior to softmean Answered by Karma Service API, with postprocessing by client First call get. Data. Provenance Then recursively get data provenance till ‘Softmean. Service’ is seen Returns this [www] 1. let $data. List : = ['lead: uuid: 1157946992 -atlas-x. gif'] 2. while ($data. List != empty) do // get data provenance for this level a. $data. Provenance = karma. get. Data. Provenance($data. List[0]) // print process information & remove data from list b. Print $data. Provenance; $data. List. delete(0) c. if ($data. Provenance. get. Produced. By() == 'Softmean. Service') break; Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($input. Data in $data. Provenance. get. Using. Data()) do i. $data. List. add($input. Data) 3. End 2006 -09 -13 14 // found

Provenance Challenge: Q 4 p p 1. 2. Find all invocations of align_warp (

Provenance Challenge: Q 4 p p 1. 2. Find all invocations of align_warp ( with parameter "-m 12") that ran on a Monday ~ Answered by access to backend DB (SQL) Use SQL query to get matching invocations Call get. Process. Provenance to get description of align_warp Returns this [www] SELECT invokee. workflow_id, invokee. service_id, invokee. workflow_node_id, invokee. workflow_timestep, invoker. workflow_id, invoker. service_id, invoker. workflow_node_id, invoker. workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHERE invokee. entity_id = invocation. invokee_id AND invoker. entity_id = invocation. invoker_id AND notifications. source_id = invocation. invokee_id AND notifications. notification_type = 'Service. Invoked' AND invokee. service_id = 'urn: qname: http: //www. extreme. indiana. edu/karma/challenge 06: Align. Warp. Service' AND notifications. notification_xml LIKE'%<Model. Menu. Number>12</Model. Menu. Number>%‘ AND Day. Of. Week(invocation. request_receive_time) = 2; // 1=Sunday, 2=Monday, . . . 2006 -09 -13 15

Provenance Challenge: Q 9 p p Find all the graphical atlas sets that have

Provenance Challenge: Q 9 p p Find all the graphical atlas sets that have metadata annotation study. Modality with values speech, visual or audio, and return all other annotations to these files. Not answered We do not expect to answer such queries through the provenance system We push the provenance information to external metadata management systems such as My. LEAD, which can answer such “join” queries on data product metadata and provenance 2006 -09 -13 16

Variations of Workflows with loops p Workflows whose structure changes dynamically p n p

Variations of Workflows with loops p Workflows whose structure changes dynamically p n p or, as a simpler case, workflows with conditional branches Hierarchical composition of workflows n workflows invoking other workflows 2006 -09 -13 17

Variations of Queries Find all [workflows | processes] with a particular execution status [completed

Variations of Queries Find all [workflows | processes] with a particular execution status [completed | failed | waiting for input] p Show the client view and service view of the provenance and check for differences p 2006 -09 -13 18

Acknowledgements Alek Slominski (GPEL Engine) Satoshi Shirasuna (XBaya Composer) LEAD Members NSF Questions www.

Acknowledgements Alek Slominski (GPEL Engine) Satoshi Shirasuna (XBaya Composer) LEAD Members NSF Questions www. extreme. indiana. edu/karma 2006 -09 -13

Sample Activities Published p More here [www] 2006 -09 -13 20

Sample Activities Published p More here [www] 2006 -09 -13 20

Karma DB Schema 2006 -09 -13 21

Karma DB Schema 2006 -09 -13 21