Querying XML XPath XQuery and XSLT Zachary G

  • Slides: 37
Download presentation
Querying XML: XPath, XQuery, and XSLT Zachary G. Ives University of Pennsylvania CIS 550

Querying XML: XPath, XQuery, and XSLT Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 27, 2005 Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan

Reminders § Homework 4 due 11/3 § XQuery § Project plan due 11/3 §

Reminders § Homework 4 due 11/3 § XQuery § Project plan due 11/3 § Milestones § Division of responsibilities § For non-RSS projects: proposal including scope, milestones, and what you plan to demonstrate § Recitation: Friday 11: 30 -12: 30, Levine 512 2

Talks of Interest Today: § Anastassia Ailamaki , CMU, “ FATES: Automatically-tuned Database Storage

Talks of Interest Today: § Anastassia Ailamaki , CMU, “ FATES: Automatically-tuned Database Storage Management ”, IRCS (3401 Walnut) Room 470 @ 3 PM Tomorrow as part of “DB & Information Retrieval Day”: § Anastassia Ailamaki , CMU, “ Staged. DB : Designing Database Servers for New Hardware Trends ”, Wu and Chen @ 11 AM § Sam Madden, MIT, “ Data Management for Next Generation Wireless Sensor Networks ”, Wu and Chen @ 12: 30 PM § Andrei Broder , Yahoo!, “ The next stage in Web IR: From query based Information Retrieval to context driven Information Supply ”, Wu and Chen @ 2: 30 PM 3

Querying XML How do you query a directed graph? a tree? The standard approach

Querying XML How do you query a directed graph? a tree? The standard approach used by many XML, semistructured -data, and object query languages: § Define some sort of a template describing traversals from the root of the directed graph § In XML, the basis of this template is called an XPath 4

XPaths In its simplest form, an file system: /mypath/subpath XPath is like a path

XPaths In its simplest form, an file system: /mypath/subpath XPath is like a path in a /*/morepath § The XPath returns a node set representing the XML nodes (and their subtrees ) at the end of the path § XPaths can have node tests at the end, returning only particular node types, e. g. , text(), processinginstruction(), comment(), element(), attribute() § XPath is fundamentally an ordered language: it can query in order-aware fashion, and it returns nodes in order 5

Sample XML <? xml version="1. 0" encoding="ISO-8859 -1" ? > <dblp> <mastersthesis mdate="2002 -01

Sample XML <? xml version="1. 0" encoding="ISO-8859 -1" ? > <dblp> <mastersthesis mdate="2002 -01 -03" key="ms/Brown 92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002 -01 -03" key="tr/dec/SRC 1997 -018"> <editor>Paul R. Mc. Jones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC 1997 -018</volume> <year>1997</year> <ee>db/labs/dec/SRC 1997 -018. html</ee> <ee>http: //www. mcjones. org/System_R/SQL_Reunion_95/</ee> </article> 6

XML Data Model Visualized Root ? xml 2002… element article mdate author title year

XML Data Model Visualized Root ? xml 2002… element article mdate author title year school 1992 key editor title journal volume year ee ee 2002… tr/dec/… PRPL… Kurt P…. p-i dblp key ms/Brown 92 attribute text mastersthesis mdate root Digital… Univ…. 1997 The… Paul R. db/labs/dec SRC… http: //www. 7

Some Example XPath Queries § § /dblp/mastersthesis/title /dblp /*/editor //title/text() 8

Some Example XPath Queries § § /dblp/mastersthesis/title /dblp /*/editor //title/text() 8

Context Nodes and Relative Paths XPath has a notion of a context node: it’s

Context Nodes and Relative Paths XPath has a notion of a context node: it’s analogous to a current directory § “. ” represents this context node § “. . ” represents the parent node § We can express relative paths: subpath/sub-subpath /. . gets us back to the context node Ø By default, the document root is the context node 9

Predicates – Selection Operations A predicate allows us to filter the node set based

Predicates – Selection Operations A predicate allows us to filter the node set based on selection-like conditions over sub. XPaths : /dblp/article[title = “Paper 1”] which is equivalent to: /dblp /article[. /title/text() = “Paper 1”] 10

Axes: More Complex Traversals Thus far, we’ve seen XPath expressions that go down the

Axes: More Complex Traversals Thus far, we’ve seen XPath expressions that go down the tree (and up one step) § But we might want to go up, left, right, etc. § These are expressed with so-called axes : self: : path -step child: : path -step parent: : path -step descendant: : path -step ancestor: : path -step descendant-or- self: : path -step ancestor-or- self: : path -step preceding- sibling: : path -step following- sibling: : path -step preceding: : path -step following: : path -step § The previous form” XPaths we saw were in “abbreviated 11

Querying Order § We saw in the previous slide that we could query for

Querying Order § We saw in the previous slide that we could query for preceding or following siblings or nodes § We can also query a node for its position according to some index: § fn: first () , fn: last () return index of 0 last element matching the last step: § fn: position () gives the relative count of the current node child: : article[fn: position th & () = fn: last ()] 12

Users of XPath § XML Schema uses simple XPaths in defining keys and uniqueness

Users of XPath § XML Schema uses simple XPaths in defining keys and uniqueness constraints § XQuery § XSLT § XLink and XPointer , hyperlinks for XML 13

XQuery A strongly-typed, Turing-complete XML manipulation language § Attempts to do static typechecking against

XQuery A strongly-typed, Turing-complete XML manipulation language § Attempts to do static typechecking against XML Schema § Based on an object model derived from Schema Unlike SQL, fully compositional, highly orthogonal: § Inputs & outputs collections (sequences or bags) of XML nodes § Anywhere a particular type of object may be used, may use the results of a query of the same type § Designed mostly by DB and functional language people Attempts to satisfy the needs of data management document management and § The database-style core is mostly complete (even has support for NULLs in XML!!) § The document keyword querying features are still in the works – shows in the order-preserving default model 14

XQuery’s Basic Form § Has an analogous form to SQL’s SELECT. . FROM. .

XQuery’s Basic Form § Has an analogous form to SQL’s SELECT. . FROM. . WHERE. . GROUP BY. . ORDER BY § The model: bind nodes (or node sets) to variables; operate over each legal combination of bindings; produce a set of nodes § “FLWOR” statement [note case sensitivity!]: for {iterators that bind variables} let {collections} where {conditions} order by {order-conditions} “SORTBY”) return {output constructor} (older version was 15

“Iterations” in XQuery A series of (possibly nested) FOR statements assigning the results of

“Iterations” in XQuery A series of (possibly nested) FOR statements assigning the results of XPaths to variables for $root in document(“ http: //my. org/my. xml for $sub in $root/ root. Element , $sub 2 in $sub/ sub. Element , … ”) § Something like a template that pattern-matches, produces a “binding tuple” § For each of these, we evaluate the WHERE and possibly output the RETURN template § document() or doc() function specifies an input file as a URI § Old version was “document”; now “doc” but it depends on your XQuery implementation 16

Two XQuery Examples <root-tag> { for $p in document(“dblp. xml”)/dblp/proceedings $yr in $p/yr where

Two XQuery Examples <root-tag> { for $p in document(“dblp. xml”)/dblp/proceedings $yr in $p/yr where $yr = “ 1999” return <proc> {$p} </proc> } </root-tag> , for $i in document(“dblp. xml”)/dblp/inproceedings[author/text Smith”] return <smith-paper> <title>{ $i/title/text() }</title> <key>{ $i/@key }</key> { $i/crossref } </smith-paper> () = “John 17

Nesting in XQuery Nesting XML trees is perhaps the most common operation In XQuery

Nesting in XQuery Nesting XML trees is perhaps the most common operation In XQuery , it’s easy – put a subquery in the return clause where you want things to repeat! for $u in document(“dblp. xml ”)/universities where $u/country = “USA” return <ms-theses-99> { $u/title } { for $ mt in $ u/. . /mastersthesis where $ mt /year/text() = “ 1999” and ______ return $ mt /title } </ms-theses-99> 18

Collections & Aggregation in XQuery In XQuery , many operations return collections § XPaths

Collections & Aggregation in XQuery In XQuery , many operations return collections § XPaths , sub- XQueries , functions over these, … § The let clause assigns the results to a variable Aggregation simply applies a function over a collection, where the function returns a value (very elegant!) let $ allpapers : = document(“dblp. xml”)/dblp/article return <article-authors> <count> { fn: count(fn: distinct-values($allpapers/authors { for $paper in doc(“dblp. xml”)/dblp/article let $ pauth : = $paper/author return <paper> {$paper/title} <count> { fn: count($pauth ) } </count> </paper> } </article-authors> )) } </count> 19

Collections, Ctd. Unlike in SQL, we can compose aggregations and create new collections from

Collections, Ctd. Unlike in SQL, we can compose aggregations and create new collections from old: <result> { let $ avg. Items. Sold : = fn: avg ( for $order in document(“my. xml ”)/orders/order let $ total. Sold = fn: sum($order /item/quantity) return $ total. Sold ) return $ avg. Items. Sold } </result> 20

Distinct-ness In XQuery , DISTINCT-ness happens as a function over a collection § But

Distinct-ness In XQuery , DISTINCT-ness happens as a function over a collection § But since we have nodes, we can do duplicate removal according to value or node § Can do fn: distinct-values(collection ) to remove duplicate values, or fn: distinct-nodes(collection ) to remove duplicate nodes for $years in fn: distinctvalues(doc(“dblp. xml”)//year/text return $years () 21

Sorting in XQuery § SQL actually allows you to sort its output, with a

Sorting in XQuery § SQL actually allows you to sort its output, with a special ORDER BY clause (which we haven’t discussed, but which specifies a sort key list) § XQuery borrows this idea § In XQuery , what we order is the sequence of “result tuples” output by the return clause: for $x in document(“dblp. xml order by $x/title/text() return $x ”)/proceedings 22

What If Order Doesn’t Matter? By default: § SQL is unordered § XQuery is

What If Order Doesn’t Matter? By default: § SQL is unordered § XQuery is ordered everywhere! § But unordered queries are much faster to answer XQuery has a way of telling the query engine to avoid preserving order: § unordered { for $x in ( mypath ) … } 23

Querying & Defining Metadata – Can’t Do This in SQL Can get a node’s

Querying & Defining Metadata – Can’t Do This in SQL Can get a node’s name by querying for $x in document(“dblp. xml”)/dblp return node- name($x ) node-name() : /* Can construct elements and attributes using names : for $x in document(“dblp. xml”)/dblp $year in $x/year, $title in $x/title/text(), element node- name($x ) { attribute {“year-” + $year} { $title } } computed /*, 24

XQuery Summary Very flexible and powerful language for XML § Clean and orthogonal: can

XQuery Summary Very flexible and powerful language for XML § Clean and orthogonal: can always replace a collection with an expression that creates collections § DB and document-oriented (we hope) § The core is relatively clean and easy to understand Turing Complete – we’ll talk more about XQuery functions soon 25

XSL(T): The Bridge Back to HTML § XSL (XML Stylesheet into two parts: Language)

XSL(T): The Bridge Back to HTML § XSL (XML Stylesheet into two parts: Language) is actually divided § XSL: FO: formatting for XML § XSLT: a special transformation language § We’ll leave XSL: FO for you to read off www. w 3. org , if you’re interested § XSLT is actually able to convert from XML HTML, which is how many people do their formatting today § Products like Apache Cocoon generally translate XML HTML on the server side 26

A Different Style of Language § XSLT is based on a series of templates

A Different Style of Language § XSLT is based on a series of templates that match different parts of an XML document § There’s a policy for what rule or template is applied if more than one matches (it’s not what you’d think!) § XSLT templates can invoke other templates § XSLT templates can be nonterminating (beware!) § XSLT templates are based on XPath “ match”es , and we can also apply other templates (potentially to “ select”ed XPaths ) § Within each template, we describe what should be output § (Matches to text default to outputting it) 27

An XSLT Stylesheet <xsl: stylesheet version=“ 1. 1” > <xsl: template match=“/ dblp ”

An XSLT Stylesheet <xsl: stylesheet version=“ 1. 1” > <xsl: template match=“/ dblp ” > <html><head>This is DBLP</head> <body> <xsl: apply -templates /> </body> </html> </xsl: template > <xsl: template match=“ inproceedings ” > <h 2>< xsl: apply -templates select=“title” /></h 2> <p><xsl: apply -templates select=“author”/></p> </xsl: template > … </xsl: stylesheet > 28

Results of XSLT Stylesheet <dblp > <inproceedings > <title>Paper 1</title> <author>Smith</author> </inproceedings > <author>

Results of XSLT Stylesheet <dblp > <inproceedings > <title>Paper 1</title> <author>Smith</author> </inproceedings > <author> Chakrabarti </author > <author>Gray</author> <title>Paper 2</title> </inproceedings > </dblp > <html><head>This Is DBLP</head> <body> <h 2>Paper 1</h 2> <p>Smith</p> <h 2>Paper 2</h 2> <p>Chakrabarti </p> <p>Gray</p> </body> </html> 29

What XSLT Can and Can’t Do § XSLT is great at converting XML to

What XSLT Can and Can’t Do § XSLT is great at converting XML to other formats § XML diagrams in SVG; HTML; § … La. Te. X § XSLT doesn’t do joins (well), it only works on one XML file at a time, and it’s limited in certain respects § It’s not a query language, really § … But it’s a very good formatting language § Most web browsers (post Netscape 4. 7 x) support XSLT and XSL formatting objects § But most real implementations use XSLT with something like Apache Cocoon § You may want to use XSL/XSLT for your projects – see www. w 3. org/TR/xslt for the spec 30

Querying XML We’ve seen three XML manipulation formalisms today: § XPath : the basic

Querying XML We’ve seen three XML manipulation formalisms today: § XPath : the basic language for “projecting and selecting” (evaluating path expressions and predicates) over XML § XQuery : a statically typed, Turing-complete XML processing language § XSLT: a template-based language for transforming XML documents § Each is extremely useful for certain applications! 31

Views in SQL and XQuery § A view is a named query § We

Views in SQL and XQuery § A view is a named query § We use the name of the view to invoke the query (treating it as if it were the relation it returns) Using the views: SQL: SELECT * CREATE VIEW V(A, B, C) AS FROM V , R SELECT A, B, C FROM R WHERE R. A = “ 123” WHERE V. B = 5 AND V. C = R. C XQuery : declare function V() as element(content )* { for $v in V ()/content, for $r in doc(“R ”)/root/tree, $r in doc(“r ”)/root/tree $a in $r/a, $b in $ r/b , $c in $ r/c where $ v/b = $r/b where $a = “ 123” return $v return <content>{$a, $b, $c}</content> } 32

What’s Useful about Views Providing security/access control § We can assign users permissions on

What’s Useful about Views Providing security/access control § We can assign users permissions on different views § Can select or project so we only reveal what we want! Can be used as relations in other queries § Allows the user to query things that make more sense Describe transformations from one schema (the base relations) to another (the output of the view) § The basis of converting from XML to relations or vice versa § This will be incredibly useful in data integration, discussed soon… Allow us to define recursive queries 33

Materialized vs. Virtual Views § A virtual view is a named query that is

Materialized vs. Virtual Views § A virtual view is a named query that is actually recomputed every time – it is merged with the referencing query CREATE VIEW V(A, B, C) AS SELECT A, B, C FROM R WHERE R. A = “ 123” SELECT * FROM V , R WHERE V. B = 5 AND V. C = R. C § A materialized view is one that is computed once and its results are stored as a table § Think of this as a cached answer § These are incredibly useful! § Techniques exist for using materialized views to answer other queries § Materialized views are the basis of relating tables in different schemas 34

Views Should Stay Fresh § Views (sometimes called intensional relations ) behave, from the

Views Should Stay Fresh § Views (sometimes called intensional relations ) behave, from the perspective of a query language, exactly like base relations (extensional relations) § But there’s an association that should be maintained: § If tuples change in the base relation, they should change in the view (whether it’s materialized or not) § If tuples change in the view, that should reflect in the base relation(s ) 35

View Maintenance and the View Update Problem § There exist algorithms to incrementally recompute

View Maintenance and the View Update Problem § There exist algorithms to incrementally recompute a materialized view when the base relations change § We can try to propagate view changes to the base relations § However, there are lots of views that aren’t easily updatable: R⋈S A B C R A B S B C 1 2 2 4 1 2 4 2 2 2 3 1 2 3 delete? 2 2 4 2 2 3 § We can ensure views are updatable by enforcing certain constraints (e. g. , no aggregation), but this limits the kinds of views we can have! 36

Next Time § Can we have views in XML over tables in relations? §

Next Time § Can we have views in XML over tables in relations? § … Or vice versa? § What other things can we use views for 37