Introduction to XML and XQuery Guangjun Kevin Xie


























![XPath 2. 0 Example //book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes: XPath 2. 0 Example //book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes:](https://slidetodoc.com/presentation_image_h2/ec9c1dc4e74dada363e4db15f8fd321b/image-27.jpg)

![XPath 2. 0 Example /bookstore/book[position()=1] evaluated to a sequence containing one element node: <book XPath 2. 0 Example /bookstore/book[position()=1] evaluated to a sequence containing one element node: <book](https://slidetodoc.com/presentation_image_h2/ec9c1dc4e74dada363e4db15f8fd321b/image-29.jpg)






















- Slides: 51

Introduction to XML and XQuery Guangjun (Kevin) Xie Nov 28, 2005 York University

Road Map n n n XML data model XML data vs Relational data XPath 2. 0 XQuery Processing XQuery Nov 28, 2005 York University 2

XML Data Model XML Information Set (Infoset) n n n Infoset is an abstract data set containing all information in an XML document provide a consistent set of definitions to refer to the information in a well-formed XML document Usually, Infosets result from parsing XML documents; but it could also be synthetic Ø By use of an API, such as DOM Ø By transforming from existing infoset n An infoset consists of a number of information items. Nov 28, 2005 York University 3

XML Data Model XML Infoset n n n "information set" and "information item" are similar in meaning to the generic terms "tree" and "node” An information item is an abstract description of some part of an XML document. Each information item has a set of associated named properties, indicated as [property name] Nov 28, 2005 York University 4

XML Data Model Information Items n 11 types of information items 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. § Document Information Item Element Information Items Attribute Information Items Character Information Items Processing Instruction Information Items Unexpanded Entity Reference Information Items Comment Information Items The Document Type Declaration Information Item Unparsed Entity Information Items Notation Information Items Namespace Information Items We will discuss the first 3 today Nov 28, 2005 York University 5

XML Data Model Document Information Item n n Exactly one doc item in an infoset Other information accessible thru its properties: Ø [children] – containing PIs, comments, etc Ø [document element] – element item corresponding to the document element Ø [version] – XML version of the document Ø… Ø etc Nov 28, 2005 York University 6

XML Data Model Element Information Items n n n One element item for each element in XML document The “root” element item is the [document element] prop. of document info item Properties: Ø Ø Ø [namespace name] – the ns part of tag name [local name] – the local part of tag name [children] – all other info items inside [attributes] – attributes elems of this item [parent] – info. Item containing this item … etc. Nov 28, 2005 York University 7

XML Data Model Attribute information items n n One attribute item for each attribute in an XML element Properties: Ø Ø Ø [namespace name] – the ns part of tag name [local name] – the local part of tag name [attribute type] – the data type of this attribute [owner element] – the element info item containing this attr … etc Nov 28, 2005 York University 8

XML Data Model Infoset example <? xml version="1. 0"? > <msg: message doc: date="19990421" xmlns: doc=“http: //doc. example. org/namespaces/doc” xmlns: msg="http: //message. example. org/" >Phone home!</msg: message> n The information set contains: n n n A document information item. An element information item with namespace name "http: //message. example. org/", local part "message", and prefix "msg". An attribute information item with the namespace name "http: //doc. example. org/namespaces/doc", local part "date", prefix "doc", and normalized value "19990421". Three namespace information items for the http: //www. w 3. org/XML/1998/namespace, http: //doc. example. org/namespaces/doc, and http: //message. example. org/ namespaces. Two attribute information items for the namespace attributes. Eleven character information items for the character data. Nov 28, 2005 York University 9

XML Data Model Infoset Example Legend: n. Document info. Item n. Element info. Item n. Attribute info. Item n. Character info. Item Version=1. 0 msg: message doc: date P xmlns: doc h Nov 28, 2005 o n e h o m e xmlns: msg ! York University 10

Road Map n n n XML data model XML data vs Relational data XPath 2. 0 XQuery Processing XQuery Nov 28, 2005 York University 11

XML Data vs Relational Data n Relational DB stems from commercial data processing Ø Information usually has regular structure n XML has roots in text documents processing Ø Often have irregular structure. n n Both are general model and capable of representing all forms of information. Different heritages cause them to be optimized for different types of applications. Nov 28, 2005 York University 12

XML Data vs Relational Data Nesting n XML Model Ø Deeply nested structure Ø Flexible (un-predefined) Ø Query easily handled by “descendants” axis in XPath 2. 0 n Relational Model Ø Flat table structure Ø Primary-foreign keys represent nesting relationship Ø Complex and flexible nesting may result in awkward queries Nov 28, 2005 York University 13

XML Data vs Relational Data Metadata n XML Model Ø Metadata mixed with ordinary data Ø Hight ratio of metadata to ordinary data n Relational Model Ø Metadata easily factored out Ø Difficult when query involve metadata Ø Ex: find the names of columns containing the value “red” Nov 28, 2005 York University 14

XML Data vs Relational Data Ordering n XML Model ØIntrinsic ordering can’t derived from value ØEx: sentences in a book is essential ØImpose challenge for the query language n Relational Model ØOrdering is dependent on values ØRows not considered to have ordering Nov 28, 2005 York University 15

XML Data vs Relational Data Null Values n XML Model Ø Representing missing value by absence of element Ø Retrieving missing value results empty list Ø Need rule on how handle empty list n Relational Model Ø “null” value to represent missing value Ø Rules for operators in the presence of null Nov 28, 2005 York University 16

XML Data vs Relational Data Structural Transformations n XML Model Ø Queries on XML documents and generate new XML documents Ø XPath 2. 0 – navigating inside a document Ø XQuery – joining elements, constructing new elements/structures n Relational Model Ø Queries on tables and generate new tables Nov 28, 2005 York University 17

XML Data vs Relational Data Definition n XML Model Ø Mixture of primitive data and nested elements Ø Elements may be optional Ø Constraints on cardinality and order Ø Impose challenges on type inference Ø Ex: proving output satisfies a given schema? n Relational Model Ø Specifying the properties of columns Ø All rows have same columns Ø Relatively simple Nov 28, 2005 York University 18

Road Map n n n XML data model XML data vs Relational data XPath 2. 0 XQuery Processing XQuery Nov 28, 2005 York University 19

XPath 2. 0 What’s XPath? n XPath is a specification for defining parts of an XML document. Ø XPath 2. 0 provides a method to locate individual node or set of nodes in a XML data model. n XPath 2. 0 is close related to XQuery Ø Same data model based on XML data model (infoset) Ø XQuery uses XPath to refer to information in the data model n XPath 2. 0 uses path expressions to navigate in XML documents n n XPath 2. 0 uses path expressions to select nodes in an XML document. An XPath expression evaluates to a sequence of nodes These path expressions look very much like the expressions you see when you work with a traditional computer file system. XPath 2. 0 is a W 3 C recommendation Nov 28, 2005 York University 20

XPath 2. 0 Data model n Represent various values including Ø the input and the output of a query Ø all values of expressions used during the intermediate calculations. n n n Based on XML infoset data model Shared with XQuery Model XML data as trees Ø Sequence based data model Ø Using sequence to represent set of trees or tree fragments Ø Everything is sequence Ø Sequences never contain other sequences Nov 28, 2005 York University 21

XPath 2. 0 Data model n n A tree whose root node is a Document Node is referred to as a document. A tree whose root node is not a Document Node is referred to as a fragment. Nov 28, 2005 York University 22

XPath 2. 0 Data model n n n Every instance of the data model is a sequence A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values A sequence is an ordered collection of zero or more items An item is either a node or an atomic value A single item appearing on its own is modeled as a sequence containing one item. Nov 28, 2005 York University 23

XPath 2. 0 Data model n There are seven kinds of Nodes in the data model: Ø Document node Ø Element node Ø Attribute node Ø Text node Ø Namespace node Ø processing instruction node Ø Comment node Nov 28, 2005 York University 24

XPath 2. 0 Sample XML Document Books. xml <? xml version="1. 0" encoding="ISO-8859 -1"? > <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30. 00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29. 99</price> </book> <author>James Mc. Govern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49. 99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39. 95</price> </bookstore> <book category="WEB"> <title lang="en">XQuery Kick Start</title> Nov 28, 2005 York University 25

XPath 2. 0 Example /bookstore/book evaluated to a sequence of nodes, each node corresponding to a book element: <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30. 00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29. 99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James Mc. Govern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49. 99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39. 95</price> </book> //book evaluated to the same result Nov 28, 2005 York University 26
![XPath 2 0 Example bookcategoryWEB evaluates to a sequence containing 2 book element nodes XPath 2. 0 Example //book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes:](https://slidetodoc.com/presentation_image_h2/ec9c1dc4e74dada363e4db15f8fd321b/image-27.jpg)
XPath 2. 0 Example //book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes: <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James Mc. Govern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49. 99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39. 95</price> </book> Nov 28, 2005 York University 27

XPath 2. 0 Example n n some $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value TRUE every $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value FALSE Nov 28, 2005 York University 28
![XPath 2 0 Example bookstorebookposition1 evaluated to a sequence containing one element node book XPath 2. 0 Example /bookstore/book[position()=1] evaluated to a sequence containing one element node: <book](https://slidetodoc.com/presentation_image_h2/ec9c1dc4e74dada363e4db15f8fd321b/image-29.jpg)
XPath 2. 0 Example /bookstore/book[position()=1] evaluated to a sequence containing one element node: <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30. 00</price> </book> Nov 28, 2005 York University 29

Road Map n n n XML data model XML data vs Relational data XPath 2. 0 XQuery Processing XQuery Nov 28, 2005 York University 30

XQuery What’s XQuery? n The language for querying XML data Ø XQuery is a language for finding and extracting elements and attributes from XML documents. n XQuery for XML is like SQL for relational databases Ø Lots of the concepts and techniques used in SQL processing and optimization can be applied to XQuery processing and optimization. Nov 28, 2005 York University 31

XQuery What’s XQuery? n XQuery is built on XPath 2. 0 expressions Ø XQuery 1. 0 and XPath 2. 0 share the same data model Ø Support the same functions and operators. Ø Understanding XPath 2. 0 is essential to understanding XQuery. n Supported by all the major database venders Ø Ø IBM Oracle Microsoft etc Nov 28, 2005 York University 32

XQuery What’s XQuery? n closed with respect to a data model Ø value of every expression in the language is guaranteed to be in the data model. Ø XPath 2. 0 is also closed § Designed to be a functional language Ø No side-effect Ø Processing and producing sequences n XQuery is becoming a W 3 C standard Ø Current draft version is XQuery 1. 0 Ø Not yet a W 3 C Recommendation (XQuery is a Working Draft) Nov 28, 2005 York University 33

XQuery FLWOR expression § For expression binds a variable with each element in a sequence iteratively § Let expression binds a variable with a sequence § Where expression applies conditions during For expression binding § Order By sort the output of the For expression § Return expression returns a sequence Nov 28, 2005 York University 34

XQuery sample XML document – bib. xml <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W. </first></author> <publisher>Addison-Wesley</publisher> <price>65. 95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W. </first></author> <publisher>Addison-Wesley</publisher> <price>65. 95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39. 95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129. 95</price> </book> </bib> York University 35

XQuery sample XML document – reviews. xml <reviews> <entry> <title>Data on the Web</title> <price>34. 95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <title>Advanced Programming in the Unix environment</title> <price>65. 95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <title>TCP/IP Illustrated</title> <price>65. 95</price> <review> One of the best books on TCP/IP. </review> </entry> </reviews> Nov 28, 2005 York University 36

XQuery sample XML document – prices. xml <prices> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore 2. example. com</source> <price>65. 95</price> </book> <title>Advanced Programming in the Unix environment</title> <source>bstore 1. example. com</source> <price>65. 95</price> </book> <title>TCP/IP Illustrated</title> <source>bstore 2. example. com</source> <price>65. 95</price> </book> <title>TCP/IP Illustrated</title> <source>bstore 1. example. com</source> <price>65. 95</price> </book> <title>Data on the Web</title> <source>bstore 2. example. com</source> <price>34. 95</price> </book> <title>Data on the Web</title> <source>bstore 1. example. com</source> <price>39. 95</price> </book> York University </prices> 37

XQuery Example 1 n List books published by Addison-Wesley after 1991, including their year and title Solution in XQuery: <bib> { for $b in doc("bib. xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> } Result: </bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> </bib> Nov 28, 2005 York University 38

XQuery Example 2 n Create a flat list of all the title-author pairs Solution in XQuery: for $b in doc("bib. xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> Nov 28, 2005 Result: <result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W. </first></author> </result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W. </first></author> </result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> </result> <title>Data on the Web</title> <author><last>Buneman</last><first>Peter</first></author> </result> <title>Data on the Web</title> <author><last>Suciu</last><first>Dan</first></author> </result> York University 39

XQuery Example 3 n For each book in the bibliography, list the title and authors Result: Solution in XQuery: for $b in doc("bib. xml")/bib/book return <result> { $b/title } { $b/author } </result> Nov 28, 2005 <result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W. </first></author> </result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W. </first></author> </result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author > <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> </result> <title>The Economics of Technology and Content for Digital TV</title> </result>> York University 40

XQuery Example 4 n For each book found at both bib. xml and reviews. xml, list the title of the book and its price from each source Solution in XQuery: Result: <books-with-prices> { for $b in doc("bib. xml")//book, $a in doc("reviews. xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <bib-price> { $a/price/text() } </bib-price> <review-price> { $b/price/text() } </review-price> </book-with-prices> } </books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-bstore 2>65. 95</price-bstore 2> <price-bstore 1>65. 95</price-bstore 1> </book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-bstore 2>65. 95</price-bstore 2> <price-bstore 1>65. 95</price-bstore 1> </book-with-prices> <title>Data on the Web</title> <price-bstore 2>34. 95</price-bstore 2> <price-bstore 1>39. 95</price-bstore 1> </book-with-prices> </books-with-prices> Nov 28, 2005 York University 41

XQuery Example 5 n List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order Solution in XQuery: <bib> { for $b in doc("bib. xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 order by $b/title return <book> Result: { $b/@year } { $b/title } <bib> </book> <book year="1992"> } <title> </bib> Advanced Programming in the Unix environment </title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib> Nov 28, 2005 York University 42

XQuery Example 6 n Solution in XQuery: In the document “prices. xml”, find the minimum price for each book, in the form of a “miniprice” element with the book title as its title attribute <results> { let $doc : = doc("prices. xml") for $t in distinct-values($doc//book/title) let $p : = $doc//book[title = $t]/price return Result: <minprice title="{ $t }"> <price>{ min($p) }</price> <results> </minprice> <minprice title="Advanced Programming in the Unix } environment"> </results> <price>65. 95</price> </minprice> <minprice title="TCP/IP Illustrated"> <price>65. 95</price> </minprice> <minprice title="Data on the Web"> <price>34. 95</price> </minprice> </results> Nov 28, 2005 York University 43

XQuery sample XML document – book. xml <p>Text. . . </p> <? xml version="1. 0"? > <figure height="200" width="500"> <book> <title>Graph representations of <title>Data on the Web</title> structures</title> <author>Serge Abiteboul</author> <image source="graphs. gif"/> <author>Peter Buneman</author> </figure> <author>Dan Suciu</author> <p>Text. . . </p> <section id="intro" difficulty="easy" > <section> <title>Introduction</title> <title>Base Types</title> <p>Text. . . </p> <section> </section> <title>Audience</title> <section> <p>Text. . . </p> <title>Representing Relational Databases</title> </section> <p>Text. . . </p> <section> <figure height="250" width="400"> <title>Web Data and the Two Cultures</title> <title>Examples of Relations</title> <p>Text. . . </p> <image source="relations. gif"/> <figure height="400" width="400"> </figure> <title>Traditional client/server </section> architecture</title> <section> <image source="csarch. gif"/> <title>Representing Object Databases</title> </figure> <p>Text. . . </p> </section> </book> <section id="syntax" difficulty="medium" > <title>A Syntax For Data</title> York University 44

XQuery Example 7 n Prepare a (nested) table of contents, listing all sections and their titles. Preserve the original attributes of each <section> element, if any Solution in XQuery: declare function local: toc( $book-or-section as element()) as element()* { for $section in $book-or-section/section return <section> { $section/@*, $section/title, local: toc($section) } </section> }; <toc> { for $s in doc("book. xml")/book return local: toc($s) } </toc> Nov 28, 2005 <toc> <section id="intro" difficulty="easy"> <title>Introduction</title> <section> <title>Audience</title> </section> <title>Web Data and the Two Cultures</title> </section> <section id="syntax" difficulty="medium"> <title>A Syntax For Data</title> <section> <title>Base Types</title> </section> <title>Representing Relational Databases</title> </section> <title>Representing Object Databases</title> </section> </toc> York University 45

Road Map n n n XML data model XML data vs Relational data XPath 2. 0 XQuery Processing XQuery Nov 28, 2005 York University 46

Processing XQuery Approaches for querying XML data n Mapping XML data into relational data Ø Query with SQL Ø May produces too many relations Ø Loses of information may occurs n n Ex: ordering, explicit hierarchical relationship between elements Using specific query languages Ø Usually integrated with SQL and relational data management Ø SQL/XML or XQuery Nov 28, 2005 York University 47

Processing XQuery IBM System RX SQL/XQuery compiler n A new XQuery parser is added to the existing relational query processing n All components extended to process XQuery Nov 28, 2005 York University 48

Processing XQuery Oracle XQuery Compilation Engine n n Parser convert XQuery into XQuery. X is an XML representation of XQuery (another W 3 C candidate recommendation) n XML parser construct a DOM tree from XQuery. X n Work on the DOM afterward n Corresponding components are extended for XQuery too Nov 28, 2005 York University 49

Processing XQuery Microsoft XQuery compilation n n XQuery compiled into XML algebra tree, which is an internal representation Algebra tree can be optimized and executed by relational query processor Optimizations are rule-based Mapper traverses the algebra tree, converting each XML operator into a relational operator sub-tree Nov 28, 2005 York University 50

References n M. Nicola, Bert van der Linden. Native XML Support in DB 2 Universal Database. Proceeding of the 31 st VLDB Conference, Trondheim, Norway, 2005 n Kevin Beyer, Chun Zhang, etc. System RX: One Part Relational, One Part XML. SIGMOD 2005, Baltimore, Maryland, USA. n Shankar Pal, Istvan Cseri, etc. XQuery Implementation in a Relational Database System. Proceedings of the 31 st VLDB Conference n Zhen Hua Liu, Vikas Arora. Native XQuery Processing in Oracle XMLDB. SIGMOD 2005, Baltimore, Maryland, USA n Scott Boag, Don Chamberlin, etc. XQuery 1. 0: An XML Query Language. http: //www. w 3. org/TR/xquery/ n Mary Fernandaz, Norman Walsh, etc. XQuery 1. 0 and XPath 2. 0 Data Model. http: //www. w 3. org/TR/xpath-datamodel/ Nov 28, 2005 York University 51