XML Data Management XQuery Werner Nutt Requirements for

  • Slides: 51
Download presentation
XML Data Management XQuery Werner Nutt

XML Data Management XQuery Werner Nutt

Requirements for an XML Query Language David Maier, W 3 C XML Query Requirements:

Requirements for an XML Query Language David Maier, W 3 C XML Query Requirements: • Closedness: output must be XML • Composability: wherever a set of XML elements is required, a subquery is allowed as well • Support for key operations: – selection – extraction, projection – restructuring

Requirements for an XML Query Language • Can benefit from a schema, but should

Requirements for an XML Query Language • Can benefit from a schema, but should also be applicable without • Retains the order of nodes • Formal semantics: – structure of results should be derivable from query – defines equivalence of queries • Queries should be representable in XML

How Does One Design a Query Language? • In most query languages, there are

How Does One Design a Query Language? • In most query languages, there are two aspects to a query: – Retrieving data (e. g. , from … where … in SQL) – Creating output (e. g. , select … in SQL) • Retrieval consists of – Pattern matching (e. g. , from … ) – Filtering (e. g. , where … )

XQuery Principles • Data Model identical with the XPath data model – documents are

XQuery Principles • Data Model identical with the XPath data model – documents are ordered, labeled trees – nodes have identity – nodes can have simple or complex types (defined in XML Schema) • A query result is an ordered list/sequence of items

XQuery Principles (cntd) • XQuery can be used without schemas, but can be checked

XQuery Principles (cntd) • XQuery can be used without schemas, but can be checked against DTDs and XML schemas • XQuery is a functional language – no statements – evaluation of expressions – function definitions

The Recipes DTD (Reminder) <!ELEMENT recipes (recipe*)> <!ELEMENT recipe (title, ingredient+, preparation, nutrition)> <!ELEMENT

The Recipes DTD (Reminder) <!ELEMENT recipes (recipe*)> <!ELEMENT recipe (title, ingredient+, preparation, nutrition)> <!ELEMENT title (#PCDATA)> <!ELEMENT ingredient (ingredient*, preparation? )> <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED> <!ELEMENT preparation (step+)> <!ELEMENT step (#PCDATA)> <!ELEMENT nutrition EMPTY> <!ATTLIST nutrition calories CDATA #REQUIRED fat CDATA #REQUIRED>

A Query over the Recipes Document <titles> {for $r in doc("recipes. xml")//recipe return $r/title}

A Query over the Recipes Document <titles> {for $r in doc("recipes. xml")//recipe return $r/title} </titles> returns <titles> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> <title>Ricotta Pie</title> … </titles>

Query Features Part to be returned as it is given {To be evaluated} <titles>

Query Features Part to be returned as it is given {To be evaluated} <titles> doc(String) returns input document {for $r in doc("recipes. xml")//recipe return $r/title} </titles> Iteration $var - variables XPath Sequence of results, one for each variable binding

Features: Summary • The result is a new XML document • A query consists

Features: Summary • The result is a new XML document • A query consists of parts that are returned as is • . . . and others that are evaluated (everything in {. . . } ) • Calling the function doc(String) returns an input document • XPath is used to retrieve node sets and

XPath is a Fragment of XQuery • doc("recipes. xml")//recipe[1]/tit le returns an element <title>Beef

XPath is a Fragment of XQuery • doc("recipes. xml")//recipe[1]/tit le returns an element <title>Beef Parmesan with Garlic Angel Hair Pasta</title> • doc("recipes. xml")//recipe[positi on()<=3] a list of elements

Beware: Attributes in XPath • doc("recipes. xml")//recipe[1]/ing redient[1] a constructor for an attribute node

Beware: Attributes in XPath • doc("recipes. xml")//recipe[1]/ing redient[1] a constructor for an attribute node /@name → attribute name {"beef cube steak"} a value of type string • string(doc("recipes. xml")//recipe

Beware: Attributes in XPath (cntd. ) • <first-ingredient> {string(doc("recipes. xml")//recip e[1] /ingredient [1]/@name)} </first-ingredient>

Beware: Attributes in XPath (cntd. ) • <first-ingredient> {string(doc("recipes. xml")//recip e[1] /ingredient [1]/@name)} </first-ingredient> an element with string content → <first-ingredient>beef cube steak</first-

Beware: Attributes in XPath (cntd. ) • <first-ingredient> {doc("recipes. xml")//recipe[1] /ingredien t[1]/@name} </first-ingredient> an

Beware: Attributes in XPath (cntd. ) • <first-ingredient> {doc("recipes. xml")//recipe[1] /ingredien t[1]/@name} </first-ingredient> an element with an attribute • Note: The XML that we write down is only the surface structure → <first-ingredient name="beef cube of the data model that is underlying XQuery steak"/>

Beware: Attributes in XPath (cntd. ) • <first-ingredient old. Name="{doc("recipes. xml")//rec ipe[1] /ingredient[ 1]/@name}">

Beware: Attributes in XPath (cntd. ) • <first-ingredient old. Name="{doc("recipes. xml")//rec ipe[1] /ingredient[ 1]/@name}"> Beef </first-ingredient> An attribute is cast as a string → <first-ingredient old. Name="beef cube

Constructor Syntax For all constituents of documents, there are constructors element constructor element first-ingredient

Constructor Syntax For all constituents of documents, there are constructors element constructor element first-ingredient attribute constructor { attribute old. Name {string(doc("recipes. xml")//recipe[1] /ingredient[1]/@name)},

Iteration with the For-Clause Syntax: for $var in xpath-expr Example: for $r in doc("recipes.

Iteration with the For-Clause Syntax: for $var in xpath-expr Example: for $r in doc("recipes. xml")//recipe return string($r) • The expression creates a list of bindings for a variable for$var $r in doc("recipes. xml")//recipe If $var for $v in doc("vegetables. xml")//vegetab occurs in an. . . expression exp, return

What Does This Return? for $i in (1, 2, 3) for $j in (1,

What Does This Return? for $i in (1, 2, 3) for $j in (1, 2, 3) return element {concat("x", $i * $j)} {$i * $j}

Nested For-clauses: Example <my-recipes> {for $r in doc("recipes. xml")//recipe return <my-recipe title="{$r/title}"> {for $i

Nested For-clauses: Example <my-recipes> {for $r in doc("recipes. xml")//recipe return <my-recipe title="{$r/title}"> {for $i in $r//ingredient return <my-ingredient> {string($i/@name)} </my-ingredient> } </my-recipes> Returns my-recipes with titles as attributes and my-ingredients with names as text content

The Let Clause Syntax: let $var : = xpath-expr • binds variable $var to

The Let Clause Syntax: let $var : = xpath-expr • binds variable $var to a list of nodes, with the nodes in document order • does not iterate over the list • allows one to keep intermediate results for reuse (not possible in SQL)

Let Clause: Example <calory-content> {let $ooreps : = doc("recipes. xml")//recipe Note the implicit [.

Let Clause: Example <calory-content> {let $ooreps : = doc("recipes. xml")//recipe Note the implicit [. //ingredient/@name="olive string concatenation oil"] for $r in $ooreps return <calories> Calories of recipes with olive oil {$r/title/text()}

Let Clause: Example (cntd. ) The query returns: <calory-content> <calories>Beef Parmesan: 1167</calories> <calories>Linguine alla

Let Clause: Example (cntd. ) The query returns: <calory-content> <calories>Beef Parmesan: 1167</calories> <calories>Linguine alla Pescadora: 532</calories> </calory-content>

The Where Clause Syntax: where <condition> • occurs before return clause • similar to

The Where Clause Syntax: where <condition> • occurs before return clause • similar to predicates in XPath • comparisons on nodes: “=“ for node equality “<<“ and “>>” for document order • for $r in doc("recipes. xml")//recipe Example: where $r//ingredient/@name="olive oil" return. . .

Quantifiers • Syntax: some/every $var in <node-set> satisfies <expr> • $var is bound to

Quantifiers • Syntax: some/every $var in <node-set> satisfies <expr> • $var is bound to all nodes in <node-set> • Test succeeds if <expr> is true for some/every binding

Quantifiers (Example) • Recipes that have some compound for $r in doc("recipes. xml")//recipe ingredient

Quantifiers (Example) • Recipes that have some compound for $r in doc("recipes. xml")//recipe ingredient where some $i in $r/ingredient satisfies $i/ingredient return $r/title • for $r in doc("recipes. xml")//recipe where every $i in $r/ingredient satisfies not($i/ingredient) return $r/title Recipes where every top level ingredient non-compound is

Element Fusion “To every recipe, add the attribute calories!” <result> {let $rs : =

Element Fusion “To every recipe, add the attribute calories!” <result> {let $rs : = doc("recipes. xml")//recipe for $r in $rs return <recipe> an attribute an element {$r/nutrition/@calories} {$r/title}

Element Fusion (cntd. ) The query result: <result> <recipe calories="1167"> <title>Beef Parmesan with Garlic

Element Fusion (cntd. ) The query result: <result> <recipe calories="1167"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipe> <recipe calories="349"><title>Ricotta Pie</title></recipe> <recipe calories="532"><title>Linguine Pescadoro</title></recipe> <recipe calories="612"><title>Zuppa Inglese</title></recipe> <recipe calories="8892"> <title>Cailles en Sarcophages</title> </recipe> </result>

Fusion with Mixed Syntax We mix constructor and XML–Syntax: element result {let $rs :

Fusion with Mixed Syntax We mix constructor and XML–Syntax: element result {let $rs : = doc("recipes. xml")//recipe for $r in $rs return <recipe> {attribute calories {$r/nutrition/@calories}}

The Same with Constructor Syntax Only element result {let $rs : = doc("recipes. xml")//recipe

The Same with Constructor Syntax Only element result {let $rs : = doc("recipes. xml")//recipe for $r in $rs return element recipe { attribute calories {$r/nutrition/@calories}, $r/title

Join “Pair every ingredient with the recipes where it is used!” let $rs :

Join “Pair every ingredient with the recipes where it is used!” let $rs : = doc("recipes. xml")//recipe for $i in $rs//ingredient for $r in $rs Join condition where $r//ingredient/@name=$i/@name

Join (cntd. ) The query result: <usedin name="beef cube steak"> <title>Beef Parmesan with Garlic

Join (cntd. ) The query result: <usedin name="beef cube steak"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </usedin>, <usedin name="onion, sliced into thin rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </usedin>, <usedin name="green bell pepper, sliced in rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </usedin>

Join Exercise Return the ingredients that • occur with different amounts in different context

Join Exercise Return the ingredients that • occur with different amounts in different context and return • the recipes where they are used • together with the amount being used in those recipes, while returning every pair only once.

Document Inversion “For every ingredient, return all the recipes where it is used!” <result>

Document Inversion “For every ingredient, return all the recipes where it is used!” <result> {let $rs : = Join condition doc("recipes. xml")//recipe for $i in $rs//ingredient return <ingredient>

Document Inversion (cntd. ) The query result: <result> <ingredient amount="1" name="Alchermes liquor" unit="cup"> <title>Zuppa

Document Inversion (cntd. ) The query result: <result> <ingredient amount="1" name="Alchermes liquor" unit="cup"> <title>Zuppa Inglese</title> </ingredient> … <ingredient amount="2" name="olive oil" unit="tablespoon"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> <title>Linguine Pescadoro</title> </ingredient> …

Eliminating Duplicates The function distinct-values(Node Set) – extracts the values of a sequence of

Eliminating Duplicates The function distinct-values(Node Set) – extracts the values of a sequence of nodes – creates a duplicate free list of values Note the coercion: nodes are cast as values! Example: let $rs : = doc("recipes. xml")//recipe return distinct-values($rs//ingredient/@name) yields

Avoiding Multiple Results in a Join We want that every ingredient is listed only

Avoiding Multiple Results in a Join We want that every ingredient is listed only once: Eliminate duplicates using distinct-values! <result> {let $rs : = doc("recipes. xml")//recipe for $in in distinct-values( $rs//ingred ient/@name)

Avoiding Multiple Results (cntd. ) The query result: <result> <recipes with="beef cube steak"> <title>Beef

Avoiding Multiple Results (cntd. ) The query result: <result> <recipes with="beef cube steak"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes> <recipes with="onion, sliced into thin rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes>. . . <recipes with="salt"> <title>Linguine Pescadoro</title> <title>Cailles en Sarcophages</title> </recipes>. . .

The Order By Clause Syntax: order by expr [ ascending | descending ] for

The Order By Clause Syntax: order by expr [ ascending | descending ] for $iname in doc("recipes. xml")//@name order by $iname descending return string($iname) yields

The Order By Clause (cntd. ) The interpreter must be told whether the values

The Order By Clause (cntd. ) The interpreter must be told whether the values should be regarded as numbers or as strings (alphanumerical sorting is default) for $r in $rs order by number($r/nutrition/@calories) return $r/title Note:

FLWOR Expresssions (pronounced “flower”) We have now seen the main ingredients of XQuery: •

FLWOR Expresssions (pronounced “flower”) We have now seen the main ingredients of XQuery: • For and Let clauses, which can be mixed • a Where clause imposing conditions • an Order by clause, which determines the order of results

Conditionals if (expr) then expr else expr Example let $is : = doc("recipes. xml")//ingredient

Conditionals if (expr) then expr else expr Example let $is : = doc("recipes. xml")//ingredient for $i in $is[not(ingredient)] let $u : = if (not($i/@unit)) then attribute unit {"pieces"}

Conditionals (cntd. ) We use the conditional to construct variants of ingredients: let $is

Conditionals (cntd. ) We use the conditional to construct variants of ingredients: let $is : = doc("recipes. xml")//ingredient for $i in $is[not(ingredient)] let $u : = if Collects all attributes in a list (not($i/@unit)) and adds a unit if needed then attribute {"unit"} {"pieces"}

Conditionals (cntd. ) The query result: <ingredient name="beef cube steak" amount="1. 5" unit="pound"/>, .

Conditionals (cntd. ) The query result: <ingredient name="beef cube steak" amount="1. 5" unit="pound"/>, . . . <ingredient name="eggs" amount="12" unit="pieces"/>,

Grouping and Aggregation functions count, sum, avg, min, max Example: The number of simple

Grouping and Aggregation functions count, sum, avg, min, max Example: The number of simple ingredients per recipe for $r in doc("recipes. xml")//recipe return

Grouping and Aggregation (cntd. ) The query result: <number title="Beef Parmesan with Garlic Angel

Grouping and Aggregation (cntd. ) The query result: <number title="Beef Parmesan with Garlic Angel Hair Pasta"> 11</number>, <number title="Ricotta Pie">12</number>, <number title="Linguine Pescadoro">15</number>, <number title="Zuppa Inglese">8</number>, <number title="Cailles en Sarcophages">30</number>

Nested Aggregation “The recipe with the maximal number of calories!” let $rs : =

Nested Aggregation “The recipe with the maximal number of calories!” let $rs : = doc("recipes. xml")//recipe let $max. Cal : = max($rs//@calories) for $r in $rs where $r//@calories = $max. Cal

Exercises Write queries that produce • A list, containing for every recipe the recipe's

Exercises Write queries that produce • A list, containing for every recipe the recipe's title element and an element with the number of calories • The same, ordered according to calories • The same, alphabetically ordered according to title • The same, ordered according to the fat

Sample Solution <results> {for $r in doc("recipes. xml")//recipe return <recipe> {attribute title {$r/title}, for

Sample Solution <results> {for $r in doc("recipes. xml")//recipe return <recipe> {attribute title {$r/title}, for $i in $r/ingredient return if (not($i/ingredient)) then $i else <ingredient> {$i/@*} </ingredient> } </recipe> } </results>

Function Declaration User-defined Functions declare function local: fac($n as xs: integer) as xs: integer

Function Declaration User-defined Functions declare function local: fac($n as xs: integer) as xs: integer { if ($n = 0) then 1 else $n * local: fac($n - 1) }; Function Call

Example: Nested Ingredients declare function local: nest($n as xs: integer, content as xs: string)

Example: Nested Ingredients declare function local: nest($n as xs: integer, content as xs: string) as element() { if ($n = 0) then element ingredient{$content} else element ingredient{local: nest($n - 1, $content)} }; local: nest(3, "Stuff")

What Does this Function Return? declare function local: depth($n as node()) as xs: integer

What Does this Function Return? declare function local: depth($n as node()) as xs: integer { if (fn: empty($n/*)) then 1 else let $cdepths : = for $c in $n/* return local: depth($c) return fn: max($cdepths) + 1 };