Tools for XML Data Exchange Dan Suciu ATT

  • Slides: 26
Download presentation
Tools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez

Tools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez Dan Suciu Tools for XML Data Exchange

XML Has Many Facets • XML for fancier Web pages – XML generated with

XML Has Many Facets • XML for fancier Web pages – XML generated with structural editors • XML for messaging – generated during applications • XML for Data Exchange – generated from legacy data Dan Suciu Tools for XML Data Exchange

XML in Data Exchange • • communities agree on common DTD export their data

XML in Data Exchange • • communities agree on common DTD export their data in XML exchange over HTTP protocol applications understand only that DTD Dan Suciu Tools for XML Data Exchange

An Example of XML Data <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <first-name>

An Example of XML Data <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <first-name> Rick </first-name> <last-name> Hull </last-name> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> Dan Suciu Tools for XML Data Exchange

XML Exchange Vision application object-relational Integrate XML Data Transform WEB (HTTP) Warehouse application relational

XML Exchange Vision application object-relational Integrate XML Data Transform WEB (HTTP) Warehouse application relational data Dan Suciu legacy data Tools for XML Data Exchange

Tools • export legacy data to XML – RXL • query/transform/integrate XML data –

Tools • export legacy data to XML – RXL • query/transform/integrate XML data – XML-QL • compress XML data – XMill • store/process incoming XML data – STORED Dan Suciu Tools for XML Data Exchange

XML-QL: A Query Language for XML • http: //www. w 3. org/TR/NOTE-xml-ql (8/98) •

XML-QL: A Query Language for XML • http: //www. w 3. org/TR/NOTE-xml-ql (8/98) • W 3 C new Working Group on QL (9/99) • XML-QL characteristics: – relational complete (like SQL) – XML input, XML output – queries, transforms, integrates XML data [Deutsch et al. , 1999 (WWW 8)] Dan Suciu Tools for XML Data Exchange

Querying in XML-QL Pattern where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author>

Querying in XML-QL Pattern where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www. a. b. c/bib. xml” construct $a Dan Suciu Tools for XML Data Exchange

Transformations in XML-QL Template where <book language = $l> <author> $a </> in “www.

Transformations in XML-QL Template where <book language = $l> <author> $a </> in “www. a. b. c/bib. xml” construct <result> <author> $a </> <lang> $l </> Note: </> abbreviates </book> or </result> or. . . <result> <author>. . . </author><lang>. . . </lang></result> Dan Suciu Tools for XML Data Exchange

Transformations in XML-QL Skolem Functions in Templates where <book language = $l> <author> $a

Transformations in XML-QL Skolem Functions in Templates where <book language = $l> <author> $a </> in “www. a. b. c/bib. xml” construct <result> <author id=F($a)> $a</> <lang> $l </> <result> <author>. . . </author> <lang>. . . </lang> </result> Dan Suciu Tools for XML Data Exchange

Data Integration in XML-QL { where <book > <isbn> $n </> <title> $t </>

Data Integration in XML-QL { where <book > <isbn> $n </> <title> $t </> in “www. books. com” construct <result id=F($n)> <title> $t </> } { where <review> <isbn> $n </> <review> $r </> in “www. reviews. com” construct <result id=F($n)> <review> $r </> } Dan Suciu <result id=“. . ” > <title>. . . </title> <review>. . . </review> </result>Tools for XML Data Exchange

RXL: Export Legacy Data To XML • legacy data – fragmented into many flat

RXL: Export Legacy Data To XML • legacy data – fragmented into many flat relations – 3 rd normal form – schema is proprietary • XML data – nested – un-normalized – schema designed by agreement Dan Suciu Tools for XML Data Exchange

RXL: An Example Store SB Book • relational database: • virtual XML view: Dan

RXL: An Example Store SB Book • relational database: • virtual XML view: Dan Suciu <store> <name> n 1 </name> <book>. . . </book>. . . </store> <name>n 2 </name> <book>. . . </book> … </store> Tools for XML Data Exchange

A Simple RXL Query • specify XML view declaratively from where Store, SB, Book

A Simple RXL Query • specify XML view declaratively from where Store, SB, Book Store. sid=SB. sid and SB. bid=Book. bid construct <store ID=f(Store. sid)> <name> Store. name </name> <book> Book. title </book> </store> Dan Suciu Tools for XML Data Exchange

RXL: Querying the XML View • users ask XML-QL queries: – find stores who

RXL: Querying the XML View • users ask XML-QL queries: – find stores who sell “The Calculus” where <store> <name> $n </name> <book> The Calculus </book> <store> construct <result> $n </result> Dan Suciu Tools for XML Data Exchange

RXL: Query composition Store SB Book RXL <store> <name> n 1 </name> <book>. .

RXL: Query composition Store SB Book RXL <store> <name> n 1 </name> <book>. . . </book>. . . </store> <name>n 2 </name> <book>. . . </book> … </store> XML-QL system composes query with view: Dan Suciu from Store, SB, Book where Store. sid=SB. sid and SB. bid=Book. bid and Book. title=“The Calculus” construct <result> Store. name </result> Tools for XML Data Exchange

Compressing XML Data • for exchange and archiving • can use general tool (gzip)

Compressing XML Data • for exchange and archiving • can use general tool (gzip) • but specialized tool twice as good (Xmill) Dan Suciu Tools for XML Data Exchange

Xmill Example: Weblogs 202. 239. 238. 16|GET / HTTP/1. 0|text/html|200|1997/10/01 -00: 02|-|4478 |-|-|http: //www

Xmill Example: Weblogs 202. 239. 238. 16|GET / HTTP/1. 0|text/html|200|1997/10/01 -00: 02|-|4478 |-|-|http: //www 02. so-net. or. jp/|Mozilla/3. 01 [ja] (Win 95; I) <apache: entry> <apache: host>202. 239. 238. 16</apache: host> <apache: request. Line>GET / HTTP/1. 0</apache: request. Line> <apache: content. Type>text/html</apache: content. Type> <apache: status. Code>200</apache: status. Code> <apache: date>1997/10/01 -00: 02</apache: date> <apache: byte. Count>4478</apache: byte. Count> <apache: referer>http: //www 02. so-net. or. jp/</apache: referer> <apache: user. Agent>Mozilla/3. 01 [ja] (Win 95; I)</apache: user. Agent> </apache: entry> </store> Dan Suciu Tools for XML Data Exchange

Xmill Example: Weblogs weblog. dat: weblog. xml: 15. 9 MB 24. 2 MB weblog.

Xmill Example: Weblogs weblog. dat: weblog. xml: 15. 9 MB 24. 2 MB weblog. dat. gz: weblog. xml. gz: 1. 6 MB 2. 1 MB xmill -p // weblog. xml weblog 1. xmi: 1. 75 MB xmill weblog. xml weblog 2. xmi: 1. 33 MB xmill -f settings. pz weblog. xml weblog 3. xmi: Dan Suciu 0. 82 MB Tools for XML Data Exchange

Xmill: Fine Tuning the Compression -p//apache: host=>seqcomb(u 8 ". " u 8) -p//apache: user.

Xmill: Fine Tuning the Compression -p//apache: host=>seqcomb(u 8 ". " u 8) -p//apache: user. Agent=>seq(e "/" e) -p//apache: byte. Count=>u -p//apache: status. Code=>e -p//apache: content. Type=>e -p//apache: request. Line=>seq("GET " rep("/" e) " HTTP/1. " e) -p//apache: date=>seq(u "/" u 8 "-" u 8 ": " di) -p//apache: referer=>or(seq("file: " t) seq("http: //" or(seq(rep(". " e) "/" rep("/" e)) rep(". " e))) t) Dan Suciu Tools for XML Data Exchange

Storing XML Data • Scenario: – receive a large XML data instance – want

Storing XML Data • Scenario: – receive a large XML data instance – want to store, manage it • Could build an XML management system from scratch (e. Xcelon) • Preferably: use existing database systems Dan Suciu Tools for XML Data Exchange

Storing XML: Ternary Relation Ref &o 1 paper &o 2 title author &o 3

Storing XML: Ternary Relation Ref &o 1 paper &o 2 title author &o 3 author &o 4 “The Calculus” “…” year Val &o 5 &o 6 “…” “ 1986” [Florescu, Kossman 1999] Dan Suciu Tools for XML Data Exchange

Storing XML: Derive Schema from DTD • DTD: <!ELEMENT employee (name, address, project*)> <!ELEMENT

Storing XML: Derive Schema from DTD • DTD: <!ELEMENT employee (name, address, project*)> <!ELEMENT address (street, city, state, zip)> • ODMG classes: class Employee public type tuple (name: string, address: Address, project: List(Project)) class Address public type tuple (street: string, …) • [Christophides et al. 1994 , Shanmugasundaram et al. 1999] Dan Suciu Tools for XML Data Exchange

STORED Approach: Mine Data to Derive Schema paper Paper 1 paper year author title

STORED Approach: Mine Data to Derive Schema paper Paper 1 paper year author title authortitleauthor title fn ln fn fn ln ln Paper 2 [Deutsch et al. 1999] Dan Suciu Tools for XML Data Exchange

Summary • XML - simple (? ), lightweight syntax • Challenge: build bridges to

Summary • XML - simple (? ), lightweight syntax • Challenge: build bridges to existing database tools • XML in data exchange: YES • XML as a new data model: NO Dan Suciu Tools for XML Data Exchange

More Info http: //www. research. att. com/~suciu Data on the Web: From Relational to

More Info http: //www. research. att. com/~suciu Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999 Dan Suciu Tools for XML Data Exchange