Introduction to XQuery Bob Du Charme www snee
Introduction to XQuery Bob Du. Charme www. snee. com/bob bob@snee. com these slides: www. snee. com/xml
What is XQuery? “ A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources. ” “ XQuery 1. 0: An XML Query Language” W 3 C Working Draft
History • February 1998: XML (Rec) • November 1999: XSLT 1. 0, Xpath 1. 0 (Recs) • (as of 8 June 2005): XPath 2. 0, XSLT 2. 0, XQuery 1. 0 in “last call Working Draft” status • Steps for a W 3 C “standard”: – Working Draft – Last Call Working Draft – Candidate Recommendation – Proposed Recommendation – Recommendation
input 1. xml sample document <doc> <p>This is a sample file. </p> <p>This line <emph>really</emph> has an inline element. </p> <p>This line doesn't. </p> <p>Do <emph>you</emph> like inline elements? </p> </doc>
Our first query Querying from the command line: java net. sf. saxon. Query " {doc('input 1. xml')//p[emph]} " Result: <? xml version="1. 0" encoding="UTF-8"? > <p>This line <emph>really</emph> has an inline element. </p> <p>Do <emph>you</emph> like inline elements? </p>
Query stored in a file • xq 1. xqy: (: Here is an XQuery comment. : ) doc('data 1. xml')//p[emph] • Executing it: java net. sf. saxon. Query xq 1. xqy
Simplifying the command line • Linux shell script xquery : java net. sf. saxon. Query $1 $2 $3 $4 $5 $6 • Windows batch file xquery. bat : java net. sf. saxon. Query %1 %2 %3 %4 %5 %6 (assuming saxon 8. jar is in classpath) • Executing either: xquery xq 1. xqy
Data for more serious examples • Recipe. ML: DTD and documentation http: //www. formatdata. com/recipeml • Squirrel's Recipe. ML Archive http: //dsquirrel. tripod. com/recipeml/indexrecipes 2. html • My sample: 294 files
Recipe. ML: typical structure <recipeml version="0. 5"> <recipe> <head> <title>Walnut Vinaigrette</title> <categories><cat>Dressings</cat></categories> <yield>1</yield> </head> <ingredients> <ing> <amt><qty>1</qty><unit>cup</unit></amt> <item>Canned No Salt Chicken</item></ing> <!-- more ing elements --> </ingredients> <directions> <step>Bring chicken broth to a boil. </step> <!-- more step elements --> </directions> </recipeml>
Saxon and collection() function • Argument to function names document in this format: <collection> <doc href="_Band__Sloppy_Joes. xml"/> <doc href="_Cheese__Fricadelle. xml"/> <!-- more doc elements. . . --> <doc href="Walton_Mountain_Coffee_Cake. xml"/> <doc href="Walty's_Dressing. xml"/> <doc href="Wan_Tan_(Wonton). xml"/> </collection>
Looking for some sugar collection('recipeml/docs. xml')/recipeml/ recipe/head/title [//ingredients/ing/item[contains(. , 'sugar')]]
A more SQL-like approach for $ingredient in collection('recipeml/docs. xml')// ingredients/ing/item[contains(. , 'sugar')] return $ingredient/. . /head/title
Outputting well-formed XML <sweets> { let $target : = 'sugar' for $ingredient in collection('recipeml/docs. xml')// ingredients/ing/item[contains(. , $target )] return $ingredient/. . /head/title } </sweets>
FLWOR expressions • • • for let where order by return "a FLWOR expression. . . supports iteration and binding of variables to intermediate results. This kind of expression is often useful for computing joins between two or more documents and for restructuring data. "
Extracting subsets: XPath vs. FLWOR approach • Get the title element for each recipe whose yield is greater than 20: collection('recipeml/docs. xml')/recipeml/ recipe/head/title[. . /yield > 20] • Go through all the documents in the collection, and for any with a yield of more than 20, get the title: for $doc in collection('recipeml/docs. xml')/recipeml where $doc/recipe/head/yield > 20 return $doc/recipe/head/title
Doing more with the for clause variable (: Create an HTML page linking to recipes that serve more than 20 people. : ) <html><head><title>Food for a Crowd</title></head> <body> <h 1>Food for a Crowd</h 1> { for $doc in collection('recipeml/docs. xml') where $doc /recipeml/recipe/head/yield > 20 return <p><a href="{document-uri( $doc )}"> { $doc /recipeml/recipe/head/title/text()} </a></p> } </body></html>
Calling functions from a let clause (: Which recipe(s) serves the most people? : ) let $max. Yield : = max(collection('recipeml/docs. xml')/recipeml/ recipe/head/yield) return collection('recipeml/docs. xml')/recipeml/ recipe[head/yield = $max. Yield]
distinct-values and order by (: A unique, sorted list of all unique ingredients in the recipe collection, with URLS to link to the recipes. : ) <ingredients> { for $ingr in distinct-values( collection('recipeml/docs. xml')/ recipeml/recipe/ingredients/ing/item ) order by $ingr return <item name="{$ingr}"> { for $doc in collection('recipeml/docs. xml') where $doc/recipeml/recipe/ ingredients/ing/item = $ingr
distinct-values and order by, continued return <title url="{document-uri($doc)}"> {$doc/recipeml/recipe/head/title/ text() } </title> } </item> } </ingredients>
Excerpt from output <ingredients> <!-- some item elements removed --> <item name=" (12 -oz) tomato paste "> <title url="file: /C: /dat/recipeml/ _Best_Ever__Pizza_Sauce. xml"> "Best Ever" Pizza Sauce</title> </item> <item name=" Baking Powder"> <title url="file: /c: /dat/recipeml/ _Blondie__Brownies. xml"> "Blondie" Brownies</title> <title url="file: /c: /dat/recipeml/ Walnut_Pound_Cake. xml"> Walnut Pound Cake</title> </item> <item name=" Baking Soda "> <title url="file: /c: /dat/recipeml/ _Faux__Sourdough. xml"> "Faux" Sourdough</title> </item> <item name=" Baking potatoes "> <title url="file: /c: /dat/recipeml/ _Indian_Chili_. xml"> "Indian Chili"</title> </item> <item name=" Baking powder "> <title url="file: /c: /dat/recipeml/ _Best__Apple_Nut_Pudding. xml"> "Best" Apple Nut Pudding</title> <title url="file: /c: /dat/recipeml/ _Gold_Room__Scones. xml"> "Gold Room" Scones</title> <title url="file: /c: /dat/recipeml/ _Outrageous_Chocolate_Chipper. xml"> "Outrageous" Chocolate-Oatmeal Chipper (Cooki</title> </item> <item name="Baking soda"> <title url="file: /c: /dat/recipeml/ _First__Ginger_Cookies. xml"> "First" Ginger Molasses Cookies</title> <title url="file: /c: /dat/recipeml/ _Foot_in_the_Cake. xml"> "Foot in the Fire" Chocolate Cake</title> </item> <item name="Tomato paste"> <title url="file: /C: /dat/recipeml/ Crawfish_Etouff'ee. xml"> "Frank's Place" Crawfish Etouff'ee </title> <title url="file: /C: /dat/recipeml/ Hamburger____Ground_Meat_Balti. xml"> "Hamburger" / Ground Meat Balti </title> <title url="file: /C: /dat/recipeml/ Indian_Chili_. xml"> "Indian Chili"</title> </item> <!-- some item elements removed --> </ingredients>
Recipe. ML: varying markup richness • One way to do it: <ing><item> (12 -oz) tomato paste </item></ing> • Another way: <ing> <amt> <qty>12</qty> <unit>oz</unit> </amt> <item>tomato paste</item> </ing>
Normalizing data with declared functions (: A unique, sorted list of all unique ingredients in the recipe collection, with URLs to link to them. Ingredient names get normalized by functions declared in the query prolog. : ) declare namespace sn = "http: //www. snee. com/ns/misc/" ; declare function sn: norm. Ing. Name($ing. Name) as xs: string { (: Normalize ingredient name. : ) (: remove parenthesized expression that may begin string, e. g. in "(10 ozs) Rotel diced tomatoes": ) let $normed. Name : = replace($ing. Name, "^(. *? )s*", "") (: convert to all lower-case : ) let $normed. Name : = lower-case($normed. Name) (: replace multiple spaces with a single one : ) let $normed. Name : = normalize-space($normed. Name) return $normed. Name };
Normalizing data with functions, part 2 of 3 declare function sn: norm. Ing. List($ing. List) as item()* { (: Normalize a list of ingredient names. : ) for $ing. Name in $ing. List return sn: norm. Ing. Name($ing. Name) }; <ingredients> { let $norm. Ing. Names : = sn: norm. Ing. List(collection('recipeml/docs. xml')// ing/item)
Normalizing data with functions, part 3 of 3 for $ingr in distinct-values($norm. Ing. Names) order by $ingr return <item name="{$ingr}"> { for $doc in collection('recipeml/docs. xml'), $i in $doc/recipeml/recipe/ingredients/ing/item where sn: norm. Ing. Name($i) = $ingr return <title url="{document-uri($doc)}"> {$doc/recipeml/recipe/head/title/text()} </title> } </item> } </ingredients>
Specs at http: //www. w 3. org/tr • XQuery 1. 0: An XML Query Language • XQuery 1. 0 and XPath 2. 0 Formal Semantics • the XQuery 1. 0 and XPath 2. 0 Data Model • XSLT 2. 0 and XQuery 1. 0 Serialization • XQuery 1. 0 and XPath 2. 0 Functions and Operators • XML Query Use Cases
Other resources • e. Xist: http: //www. exist-db. org • http: ww/w 3. org/TR: • Mark. Logic: http: //www. marklogic. com • Mike Kay “Comparing XSLT and XQuery”: http: //idealliance. org/proceedings/xtech 05/pap ers/02 -03 -01/ • http: ww/w 3. org/TR: – XQuery Update Requirements – XQuery 1. 0 and XPath 2. 0 Full-Text
- Slides: 26