ChangeCentric Management of Versions in an XML Warehouse


























- Slides: 26
Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt
Overview n The Xyleme Project n Change Management n Version Management – XIDs – XML Diff – Deltas – Storage of XML documents versions – Implementation and experiments VLDB-Sept 2001 Amélie Marian 2
The Xyleme Project n A dynamic XML Data Warehouse with high level services: – – User-friendly Query Engine Semantic Data Integration Version Management Query Subscription, Change Monitoring services n Xyleme project is now finished n Start-up also called Xyleme VLDB-Sept 2001 Amélie Marian 3
Change Management n Version Management n Learning about Changes n Monitoring Changes: Query Subscription n Querying the Past: Temporal Queries VLDB-Sept 2001 Amélie Marian 4
Version Management Our Requirements: n Obtain the current version n Get the modifications since time t n Subscribe to change notifications, query changes n Compute temporal queries n Rebuild the version Vi of a document at time ti VLDB-Sept 2001 Amélie Marian 5
Getting the Documents XML documents are fetched from the web n We only have snapshots of the documents n Catalog Pr Pr N P Camera 300 TV 100 Catalog Pr Pr Pr N P N P VCR 200 TV 100 DVD 500 VCR 150 Version 2 Version 1 VLDB-Sept 2001 Pr Amélie Marian 6
XIDs n Unique identifiers needed to track XML nodes through time: • Track changes on a specific node (ex: a product in a catalog) • Reconstruct the history of a node n But physically adding an ID attribute to each node is expensive storage-wise XIDs: allow to attach persistent IDs to every node in a storage efficient manner VLDB-Sept 2001 Amélie Marian 7
XIDs n XIDs stored separately as a list (XID-map) – List of the nodes IDs in a postorder traversal of the tree – XIDnext: gives the next available XID Compact Representation n Document is not modified n VLDB-Sept 2001 Amélie Marian 13 1 12 15 3 2 14 7 10 8 11 9 XID-map (1 -3, 14 -15, 7 -13|16) 8
XML Diff n We implemented a XML diff algorithm to compute changes between two versions of a document: – Use of XML structure for matching – Content matching Linear in the size of the document n XML diff has two roles: – Match nodes – Build the delta n Ongoing work on improving the XML diff VLDB-Sept 2001 Amélie Marian 9
Node Matching using a Diff Algorithm Catalog 16 Pr Delete Pr 5 10 N P 2 4 Camera 300 1 3 7 9 TV 100 6 8 Pr Pr 10 15 VLDB-Sept 2001 Pr Pr 21 15 N P N P VCR 200 TV 100 DVD 500 VCR 150 12 14 11 13 7 6 9 8 18 17 20 19 12 14 Update 11 13 Version 2 Version 1 XID-map: (1 -16|17) 16 Insert Diff (V 1, V 2) delete(5) update(13, 150) insert(16, 2, (17 -21)) Amélie Marian New XID-map: (6 -10, 17 -21, 11 -16|22) 10
Edit-Scripts = SEQUENCE n Sequences of basic operations over XML trees: • • Delete(n) Update(n, v) Insert(m, k, T) Move(n, k, m) An Edit Script can be applied to a document D if its operations are consistent with D n An Edit Script applied to a document D will result in a unique document D’ n Several Edit Scripts applied to a document D can result in the same document D’ n VLDB-Sept 2001 Amélie Marian 11
Deltas (Δ) = SET n n n We introduce an alternative way of representing changes: Deltas Δi, j (unit delta) contains the Set of operations needed to go from Vi to Vj ( Diff(Vi, Vj) ) A Delta (Δ) over a document D is the sequence of unit deltas over D: Δ={Δ 1, 2, . . . , Δk-1, k} There is a (almost) unique delta from Vi to Vj We represent Deltas as XML documents VLDB-Sept 2001 Amélie Marian 12
Shortcomings of Deltas are not reversible and cannot be composed (information on position is missing) Storage Policies n Only a) and b) a) V 1, Δ 1, 2, …Δnow-1, now lossless b) Δ 2, 1, …Δnow, now-1, n But we would like Vnow to have fast access c) V 1, Δ 2, 1, …Δnow, now-1 to: d) Δ 1, 2, …Δnow-1, now, – Vnow V now –Δ n i, now VLDB-Sept 2001 Amélie Marian 13
Completed Deltas (Δ+) n Completed deltas contain more information : • • Delete(m, k, T) Update(n, ov, nv) Insert(m, k, T) Move(n, k, m, p, q) Completed Deltas can be reversed and composed n Completed Deltas are in the spirit of some logs in DB systems n VLDB-Sept 2001 Amélie Marian 14
<delta> <unit_delta> … </unit_delta> <time from=“ 1” to=“ 2”/> <delete parent=“ 16” position=“ 1” xid-map=“(1 -5)”> <Product> <Name>Camera</Name> <Price>300</Price> </Product> </delete> <update xid=“ 13” new_value=“ 150” old_value=“ 200”/> <insert parent=“ 16” position=“ 2” xid-map=“(17 -21)”> <Product> <Name>DVD</Name> <Price>500</Price> </Product> </insert> </unit_delta> </delta> Example of XML Δ+ 15
Operations on Deltas n Compute with version: – Vi o Δ+i, j = Vj – Vi o Δi, j = Vj n Reverse: (Δ+i, j)-1= Δ+j, i n Compose: Δ+i, j; Δ+j, k =Δ+i, k n Simplify: Δ+i, j → Δi, j VLDB-Sept 2001 Amélie Marian 16
Storage of Versions n For a document D (or a query result Q), we store: – Current Version: Vk – XID-map (as text) of Vk – Current Δ+ = {Δ+1, 2, . . . , Δ+k-1, k} n When a new version k+1 arrives: – Compute XML diff between k and k+1, compute Δ+k, k+1 – Replace current version: Vk+1 – Replace XID-map – Append Δ+k, k+1 to Δ+ VLDB-Sept 2001 Amélie Marian 17
Levels of Versioning n Full versioning is expensive, we support different levels of versioning: – Full Versioning: Vnow + Δ+ – Partial Versioning: Vnow + Δ – Last Version Update: Vnow + Δnow-1, now – Change Support: Vnow + XML diff computed for Query Subscription – Not Versioned: Vnow VLDB-Sept 2001 Amélie Marian 18
Implementation n Version Manager and XML diff implemented in C++ n A change simulator was implemented for tests n A GUI was implemented VLDB-Sept 2001 Amélie Marian 19
GUI Interface 20
Deltas Statistics Reasonable when there are not many modifications n Relatively expensive for small documents n Depends on the quality of the diff n VLDB-Sept 2001 Amélie Marian 21
Deltas Statistics (2) 30% of modifications on the document n From left to right n – Snapshots – Completed Deltas – Deltas: composition and previous version reconstruction are not possible – Composed Completed Deltas: advantages of Completed Deltas but coarser granularity and higher cost. VLDB-Sept 2001 Amélie Marian 22
Conclusion n Management of Versions based on Change Representation: – Representation in tree data (XML) – Study of storage policies – Implementation of running prototypes n Completed Deltas: a Set of Modifications – Mathematical properties on completed deltas (algebraic group) n Current work on Query Subscription, Continuous Queries and Changes over Collections of Documents VLDB-Sept 2001 Amélie Marian 23
References n Version Management – Chien, Tsotras and Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. VLDB 2001. – Chawathe, Abiteboul and Widom. Managing Historical Semistructured Data. TAPOS 1999. – Cellary and Jomier. Consistency of Versions in Object-Oriented Databases. VLDB 1990. – Adiba and Lindsay. Database Snapshots. VLDB 1980. n Diff Algorithms – Chawathe and Garcia-Molina. Meaningful Change Detection in Structured Data. Sigmod 1997. – Cobena, Abiteboul and Marian. Detecting Changes in XML Documents. Technical report INRIA. n Xyleme – Cluet, Veltri and Vodislav. Views in a Large Scale XML Repository. VLDB 2001. – Nguyen, Abiteboul, Cobena and Preda. Monitoring XML data on the Web. Sigmod 2001. VLDB-Sept 2001 Amélie Marian 24
Example: Edit-Scripts vs. Deltas n A Possible Edit-Script: P Insert(B, 1, P) Insert(C, 1, P) n A The Delta: Version 0 Insert(B, 2, P) Insert(C, 1, P) P Edit-Scripts Deltas Relative position (at time of operation) VLDB-Sept 2001 Absolute position (final) C B A Version 1 Amélie Marian 25
Example: Missing Information for Delta Composition (Δ(0, 2)) P P C A C B P A B D A Version 1 Version 2 Δ(0, 1) Δ(1, 2) Δ+(1, 2) Insert(B, 2, P) Delete(C) Insert (D, 2, P) Delete(C, 1, P) Insert (D, 2, P) Version 0 Deltas do not give information on parents and positions of deleted elements ® Positions of inserted elements in composition cannot be computed VLDB-Sept 2001 Amélie Marian 26