Provenance Semirings T J Green G Karvounarakis V
- Slides: 15
Provenance Semirings T. J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (Pr. OPr) Philadelphia, PA June 26, 2007 PROPR 2007
Provenance ●First studied in data warehousing ▪Lineage [Cui, Widom, Wiener 2000] ●Scientific applications (to assess quality of data) ▪Why-Provenance [Buneman, Khanna, Tan 2001] ●Our interest: P 2 P data sharing in the O RCHESTRA system (project headed by Zack Ives) ▪Trust conditions based on provenance ▪Deletion propagation PROPR 2007 2
Annotated relations ●Provenance: an annotation on tuples ●Our observation: propagating provenance/lineage through views is similar to querying ▪Incomplete Databases (conditional tables) ▪Probabilistic Databases (independent tuple tables) ▪Bag Semantics Databases (tuples with multiplicities) ●Hence we look at queries on relations with annotated tuples PROPR 2007 3
Incomplete databases: boolean C-tables R a b c d b e f g e { I(R)= ; , boolean variables p r s abc semantics: a set of instances , dbe , fge , PROPR 2007 abc dbe , abc fge , dbe fge , abc dbe fge } 4
Imielinski & Lipski (1984): queries on C -tables R a b c d b e f g e union of conjunctive queries (UCQ) sr r q(x, z) : - R(x, _, z), R(_, _, z) p r s q(R) ac ae dc de fe q(x, z) : - R(x, y, _), R(_ , y, z) r r (p Æ p) Ç (p Æ p) pÆr rÆp (r Æ r) Ç (r Æ s) (s Æ s) Ç (s Æ r) PROPR 2007 p pÆr = pÆr r s p=true r=false s=true ac fe 5
Why-provenance/lineage Which input tuples contribute to the presence of a tuple in the output? same query R a b c d b e f g e tuple ids p r s [Cui, Widom, Wiener 2000] [Buneman, Khanna, Tan 2001] PROPR 2007 q(R) ac ae dc de fe {p} {p, r} {r, s} 6
C –tables vs. Why-provenance ac (p Æ p) Ç (p Æ p) ae pÆr dc rÆp de (r Æ r) Ç (r Æ s) fe (s Æ s) Ç (s Æ r) ac ({p} {p}) ae {p} {r} dc {r} {p} de ({r} {r}) ({r} {s}) fe ({s} {s}) ({s} {r}) c-table calculations Why-provenance calculations PROPR 2007 The structure of the calculations is the same! 7
Another analogy, with bag semantics R a b c d b e f g e q(R) tuple multiplicities 2 5 1 same query c-table calculations ac (p Æ p) Ç (p Æ p) ae pÆr dc rÆp de (r Æ r) Ç (r Æ s) fe (s Æ s) Ç (s Æ r) ac 8 ac 2¢ 2+2¢ 2 ae 10 ae 2¢ 5 dc 10 dc 5¢ 2 de 55 de 5¢ 5+5¢ 1 fe 7 fe 1¢ 1+1¢ 5 PROPR 2007 multiplicity calculations The structure of the calculations is the same! 8
Abstracting the structure of these calculations C-tables Bags Why-provenance Abstract join union Æ Ç abstract calculations ac (p ¢ p) + (p ¢ p) ae p¢r dc r¢p d e (r ¢ r) + (r ¢ s) f e (s ¢ s) + (s ¢ r) ¢ [ [ + ¢ + These expressions capture the abstract structure of the calculations, which encodes the logical derivation of the output tuples We shall use these expressions as provenance PROPR 2007 9
Positive K-relational algebra ●We define an RA+ on K-relations: ▪The ¢ corresponds to join: ▪The + corresponds to union and projection ▪ 0 and 1 are used for selection predicates ▪Details in the paper (but recall how we evaluated the UCQ q earlier and we will see another example later) PROPR 2007 10
RA+ identities imply semiring structure! ●Common RA+ identities ▪Union and join are associative, commutative ▪Join distributes over union ▪etc. (but not idempotence!) These identities hold for RA+ on K-relations iff (K, +, ¢, 0, 1) is a commutative semiring PROPR 2007 (K, +, 0) is a commutative monoid (K, ¢, 1) is a commutative monoid ¢ distributes over +, etc 11
Calculations on annotated tables are particular cases (B, Ç, Æ, false, true) usual relational algebra (N, +, ¢, 0, 1) bag semantics (Pos. Bool(B), Ç, Æ, false, true) boolean C-tables (P( ), [, Å, ; , ) probabilistic event tables (P(X), [, [, ; ) lineage/why-provenance PROPR 2007 12
Provenance Semirings ●X = {p, r, s, …}: indeterminates (provenance “tokens” for base tuples) ●N[X] : multivariate polynomials with coefficients in N and indeterminates in X ●(N[X], +, ¢, 0, 1) is the most “general” commutative semiring: its elements abstract calculations in all semirings ●N[X] –relations are the relations with provenance! ▪The polynomials capture the propagation of provenance through (positive) relational algebra PROPR 2007 13
A provenance calculation q(x, z) : - R(x, _, z), R(_, _, z) q(x, z) : - R(x, y, _), R(_ , y, z) q(R) R a b c d b e f g e p r s a a d d f c e e Why-provenance 2 p 2 pr pr 2 r 2 + rs 2 s 2 + rs same why-provenance, different polynomials ac ae dc de fe {p} {p, r} {r, s} ●Not just why- but also how-provenance (encodes derivations)! ●More informative than why-provenance PROPR 2007 14
Further work ●Application: P 2 P data sharing in the O RCHESTRA system: ▪Need to express trust conditions based on provenance of tuples ▪Incremental propagation of deletions ▪Semiring provenance itself is incrementally maintainable ●Future extensions: ▪full relational algebra: For difference we need semirings with “proper subtraction” ▪richer data models: nested relations/complex values, XML PROPR 2007 15
- Provenance semirings
- Yellow light
- "provenance properties"
- Data provenance definition
- Fhir provenance example
- Software of unknown provenance
- Lernpyramide von green & green (2005)
- Blue green yellow red
- Frc driver station mac
- Green economy mappa concettuale
- Read the excerpts from sir gawain and the green knight
- Under the green light chapter 13
- Sejarah green it
- Green journalism adalah
- Kings play cards on fat green stools
- Electrical safety signs and symbols ppt