Tools for XML Data Exchange Dan Suciu ATT


























- Slides: 26
Tools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez Dan Suciu Tools for XML Data Exchange
XML Has Many Facets • XML for fancier Web pages – XML generated with structural editors • XML for messaging – generated during applications • XML for Data Exchange – generated from legacy data Dan Suciu Tools for XML Data Exchange
XML in Data Exchange • • communities agree on common DTD export their data in XML exchange over HTTP protocol applications understand only that DTD Dan Suciu Tools for XML Data Exchange
An Example of XML Data <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <first-name> Rick </first-name> <last-name> Hull </last-name> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> Dan Suciu Tools for XML Data Exchange
XML Exchange Vision application object-relational Integrate XML Data Transform WEB (HTTP) Warehouse application relational data Dan Suciu legacy data Tools for XML Data Exchange
Tools • export legacy data to XML – RXL • query/transform/integrate XML data – XML-QL • compress XML data – XMill • store/process incoming XML data – STORED Dan Suciu Tools for XML Data Exchange
XML-QL: A Query Language for XML • http: //www. w 3. org/TR/NOTE-xml-ql (8/98) • W 3 C new Working Group on QL (9/99) • XML-QL characteristics: – relational complete (like SQL) – XML input, XML output – queries, transforms, integrates XML data [Deutsch et al. , 1999 (WWW 8)] Dan Suciu Tools for XML Data Exchange
Querying in XML-QL Pattern where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www. a. b. c/bib. xml” construct $a Dan Suciu Tools for XML Data Exchange
Transformations in XML-QL Template where <book language = $l> <author> $a </> in “www. a. b. c/bib. xml” construct <result> <author> $a </> <lang> $l </> Note: </> abbreviates </book> or </result> or. . . <result> <author>. . . </author><lang>. . . </lang></result> Dan Suciu Tools for XML Data Exchange
Transformations in XML-QL Skolem Functions in Templates where <book language = $l> <author> $a </> in “www. a. b. c/bib. xml” construct <result> <author id=F($a)> $a</> <lang> $l </> <result> <author>. . . </author> <lang>. . . </lang> </result> Dan Suciu Tools for XML Data Exchange
Data Integration in XML-QL { where <book > <isbn> $n </> <title> $t </> in “www. books. com” construct <result id=F($n)> <title> $t </> } { where <review> <isbn> $n </> <review> $r </> in “www. reviews. com” construct <result id=F($n)> <review> $r </> } Dan Suciu <result id=“. . ” > <title>. . . </title> <review>. . . </review> </result>Tools for XML Data Exchange
RXL: Export Legacy Data To XML • legacy data – fragmented into many flat relations – 3 rd normal form – schema is proprietary • XML data – nested – un-normalized – schema designed by agreement Dan Suciu Tools for XML Data Exchange
RXL: An Example Store SB Book • relational database: • virtual XML view: Dan Suciu <store> <name> n 1 </name> <book>. . . </book>. . . </store> <name>n 2 </name> <book>. . . </book> … </store> Tools for XML Data Exchange
A Simple RXL Query • specify XML view declaratively from where Store, SB, Book Store. sid=SB. sid and SB. bid=Book. bid construct <store ID=f(Store. sid)> <name> Store. name </name> <book> Book. title </book> </store> Dan Suciu Tools for XML Data Exchange
RXL: Querying the XML View • users ask XML-QL queries: – find stores who sell “The Calculus” where <store> <name> $n </name> <book> The Calculus </book> <store> construct <result> $n </result> Dan Suciu Tools for XML Data Exchange
RXL: Query composition Store SB Book RXL <store> <name> n 1 </name> <book>. . . </book>. . . </store> <name>n 2 </name> <book>. . . </book> … </store> XML-QL system composes query with view: Dan Suciu from Store, SB, Book where Store. sid=SB. sid and SB. bid=Book. bid and Book. title=“The Calculus” construct <result> Store. name </result> Tools for XML Data Exchange
Compressing XML Data • for exchange and archiving • can use general tool (gzip) • but specialized tool twice as good (Xmill) Dan Suciu Tools for XML Data Exchange
Xmill Example: Weblogs 202. 239. 238. 16|GET / HTTP/1. 0|text/html|200|1997/10/01 -00: 02|-|4478 |-|-|http: //www 02. so-net. or. jp/|Mozilla/3. 01 [ja] (Win 95; I) <apache: entry> <apache: host>202. 239. 238. 16</apache: host> <apache: request. Line>GET / HTTP/1. 0</apache: request. Line> <apache: content. Type>text/html</apache: content. Type> <apache: status. Code>200</apache: status. Code> <apache: date>1997/10/01 -00: 02</apache: date> <apache: byte. Count>4478</apache: byte. Count> <apache: referer>http: //www 02. so-net. or. jp/</apache: referer> <apache: user. Agent>Mozilla/3. 01 [ja] (Win 95; I)</apache: user. Agent> </apache: entry> </store> Dan Suciu Tools for XML Data Exchange
Xmill Example: Weblogs weblog. dat: weblog. xml: 15. 9 MB 24. 2 MB weblog. dat. gz: weblog. xml. gz: 1. 6 MB 2. 1 MB xmill -p // weblog. xml weblog 1. xmi: 1. 75 MB xmill weblog. xml weblog 2. xmi: 1. 33 MB xmill -f settings. pz weblog. xml weblog 3. xmi: Dan Suciu 0. 82 MB Tools for XML Data Exchange
Xmill: Fine Tuning the Compression -p//apache: host=>seqcomb(u 8 ". " u 8) -p//apache: user. Agent=>seq(e "/" e) -p//apache: byte. Count=>u -p//apache: status. Code=>e -p//apache: content. Type=>e -p//apache: request. Line=>seq("GET " rep("/" e) " HTTP/1. " e) -p//apache: date=>seq(u "/" u 8 "-" u 8 ": " di) -p//apache: referer=>or(seq("file: " t) seq("http: //" or(seq(rep(". " e) "/" rep("/" e)) rep(". " e))) t) Dan Suciu Tools for XML Data Exchange
Storing XML Data • Scenario: – receive a large XML data instance – want to store, manage it • Could build an XML management system from scratch (e. Xcelon) • Preferably: use existing database systems Dan Suciu Tools for XML Data Exchange
Storing XML: Ternary Relation Ref &o 1 paper &o 2 title author &o 3 author &o 4 “The Calculus” “…” year Val &o 5 &o 6 “…” “ 1986” [Florescu, Kossman 1999] Dan Suciu Tools for XML Data Exchange
Storing XML: Derive Schema from DTD • DTD: <!ELEMENT employee (name, address, project*)> <!ELEMENT address (street, city, state, zip)> • ODMG classes: class Employee public type tuple (name: string, address: Address, project: List(Project)) class Address public type tuple (street: string, …) • [Christophides et al. 1994 , Shanmugasundaram et al. 1999] Dan Suciu Tools for XML Data Exchange
STORED Approach: Mine Data to Derive Schema paper Paper 1 paper year author title authortitleauthor title fn ln fn fn ln ln Paper 2 [Deutsch et al. 1999] Dan Suciu Tools for XML Data Exchange
Summary • XML - simple (? ), lightweight syntax • Challenge: build bridges to existing database tools • XML in data exchange: YES • XML as a new data model: NO Dan Suciu Tools for XML Data Exchange
More Info http: //www. research. att. com/~suciu Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999 Dan Suciu Tools for XML Data Exchange