Modeling Provenance through User views Sarah CohenBoulakia Shirley
Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania Provenance Challenge, Sept. 2006 1
Our approach ¡ Model of provenance l l ¡ ¡ Based on study of user requirements (CIPRES) Based on careful studies of workflow systems (Kepler, My. Grid, Chimera) minimal information to reason about provenance No workflow system is proposed User views Capability of workflow systems to group steps (forming boxes) and to zoom into boxes Multi-granularity levels of provenance granularity l ¡ Implemented in Oracle 10 g and Java l l Relational framework augmented with transitive Relational closure Java/Spring/JDBC: object layer and user interface object Provenance Challenge, Sept. 2006 2
Workflow Representation input data reslice: step-class 8. reslice: step ¡ Terminology l l l output data Step-classes (static) An execution of a workflow generates a partial order of steps (dynamic) steps ¡ Instances of step classes Each step has input and output data Provenance Challenge, Sept. 2006 3
Provenance Trace ¡ Base tables l Data(dataid, name, type), Data. Attributes(dataid, Data. Attributes attribute, value) ¡ Data(1, Anatomy Image) ¡ Data. Attributes(1, center, UChicago) l l l ¡ Center=UChicago Instance. Of(Step, Step-Class, ts), Step. Params(step, Instance. Of Step. Params attribute, value), Stage. Instance(step, stage) Input(step. Id, data. Id, ts) / Output(step. Id, data. Id, ts) Input Output step. Id takes as input /produces data. Id at time ts Views l l Process(step. Id, step. Class, input, output, time) Process … Provenance Challenge, Sept. 2006 4
Provenance Queries Q 1: Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is SELECT DISTINCT step, step-class, input, output FROM Process START WITH output = ( SELECT ID FROM Data. ID WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output PRIOR Implements ORDER BY step; transitive closure. Necessary to return all the data used to (recursively) compute Atlas X Graphic. Provenance Challenge, Sept. 2006 5
Provenance Queries (Cont. ) ¡ All the queries can be answered by our system l ¡ Using SQL l l l ¡ Code available on TWiki Connect by operators Joins with several tables (e. g. Parameters, Data. Attribute) Minus and Union operators The generalization of Q 7 (difference between Q 7 workflows) is currently not answerable Provenance Challenge, Sept. 2006 6
Workflow Variant: User Views ¡ What are User views? What l l l ¡ Box 1 Level of detail the user detail wishes to track Permissions given to the user Ability of the user to see / Ability know the sub-steps (distributed computation) Box 2 Why use User Views? Why l l l UBio Throw away unimportant away intermediate results Better understanding of understanding the workflow UBlack. Box Reduce the amount of UAdmin can see everything work to be redone UAdmin Provenance Challenge, Sept. 2006 7
Querying within User Views ¡ Need information from l l l ¡ ¡ Workflow: Step-class containment and user views Cinput(sid, idid, tsi), Coutput(sid, idid, tso) View UProcess(usr, step-class, input, output) Query: What are all the data items used to produce“Resliced Image 1”? SELECT * FROM u. Process upc WHERE usr = : user. Name START WITH output. Name = 'Resliced Image 1' CONNECT BY PRIOR upc. output = upc. input; UAdmin: Anatomy Header 1, Anatomy Image 1, UAdmin Reference Image, Reference Header, Wrap param 1 UBio: Anatomy Header 1, Anatomy Image 1, UBio Reference Image, Reference Header UBlack. Box: empty answer! UBlack. Box Provenance Challenge, Sept. 2006 8
Conclusion, Perspectives ¡ Able to answer the queries, including queries Data and Step provenance Data ¡ Immediate and Deep (recursive) provenance Deep ¡ ¡ Variation of the workflow and queries considering user views l l ¡ Multi-granularity levels of provenance granularity Only visible and necessary data are kept Open questions l l l What is the meaning of “stage” in a workflow (with stage respect to user views)? What are we expecting as an answer to the difference between two workflows (cf. query 7)? Are all the procedures of the workflow “biologically significant” (cf. user views)? significant Provenance Challenge, Sept. 2006 9
Acknowledgements ¡ Kepler Group l l l Shawn Bowers Bertram Ludascher Timothy Mc. Phillips ¡ Biologists from the CIPRES project ¡ Members from the Database group, University of Pennsylvania ¡ This work is supported by NSF grants 0513778, 0415810, and 0612177 Provenance Challenge, Sept. 2006 10
User interface Provenance Challenge, Sept. 2006 11
- Slides: 11