CSE 636 Data Integration XML Query Languages XQuery
- Slides: 84
CSE 636 Data Integration XML Query Languages XQuery
XQuery • • http: //www. w 3. org/TR/xquery/ (11/05) Functional Programming Language Operates on XML Sources Returns XML 2
XQuery Components • XQuery is composed of – – Path expressions Element constructors FLWOR expressions … and more … 3
Path Expressions doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER Evaluate expression by collecting all elements which satisfy the path CUSTOMER_ORDERS CUSTOMER EMAIL NAME Sue sw@nah. com CUSTOMER ORDER NAME EMAIL Tom td@mbi. com NO CARRIER ITEM 1897 UPS CUSTOMER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NAME EMAIL Ann ag@tii. org SKU QTY C 5 1 SKU B 7 SKU P 5 QTY 2 QTY 1 ITEM SKU QTY C 5 2 4
Element Construction <ORDERS> { doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER } </ORDERS> A complete, executable query returning the ORDERS tree 1. Evaluate expression inside {. . . } ORDERS 2. Connect into tree ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B 7 SKU P 5 SKU QTY C 5 1 ORDER QTY 2 QTY 1 ITEM SKU QTY C 5 2 5
Introduction to for Expression Our path query … <ORDERS> { doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER } </ORDERS> … can be rewritten using a for expression: <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return $order } </ORDERS> ORDERS ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B 7 SKU P 5 SKU QTY C 5 1 ORDER QTY 2 QTY 1 ITEM SKU QTY C 5 2 6
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 7
Example with where We take our previous query and add a where clause: <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } </ORDERS> The output is the same as in the previous example, except non-UPS carriers are removed. ORDERS ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B 7 SKU P 5 SKU QTY C 5 1 ORDER QTY 2 QTY 1 ITEM SKU QTY C 5 2 8
FLWOR Expressions: The for Clause <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } </ORDERS> CUSTOMER_ORDERS ORDER NAME EMAIL Tom td@mbi. com NO CARRIER ITEM 1897 UPS CUSTOMER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NAME EMAIL Ann ag@tii. org EMAIL NAME Sue sw@nah. com CUSTOMER The for variable ranges over result of in expression SKU QTY C 5 1 SKU B 7 SKU P 5 QTY 2 CUSTOMER QTY 1 ITEM SKU QTY C 5 2 9
FLWOR Expressions: The where Clause <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } </ORDERS> CUSTOMER_ORDERS ORDER NAME EMAIL Tom td@mbi. com NO CARRIER ITEM 1897 UPS CUSTOMER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NAME EMAIL Ann ag@tii. org EMAIL NAME Sue sw@nah. com CUSTOMER Selects only orders with UPS as the carrier SKU QTY C 5 1 SKU B 7 SKU P 5 QTY 2 CUSTOMER QTY 1 ITEM SKU QTY C 5 2 10
FLWOR Expressions: The return Clause <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } </ORDERS> Every $order that qualified is added to the return list: ORDER NO CARRIER ITEM 1878 UPS SKU B 7 QTY 2 NO CARRIER ITEM 1897 UPS SKU P 5 QTY 1 ITEM SKU QTY C 5 2 11
FLWOR Expressions: Final Result <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } </ORDERS> ORDERS … is constructed into the ORDERS element to complete the example. The list coming from the FLWOR expression … ORDER NO CARRIER ITEM 1878 UPS SKU B 7 QTY 2 NO CARRIER ITEM 1897 UPS SKU P 5 QTY 1 ITEM SKU QTY C 5 2 12
Example with Element Construction <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID> } </ORDERS> Here, the return statement constructs elements from values • The “data” function returns the value of an element • The return statement also contains tags • The next slide illustrates how the following result is created: ORDERS ID 1878 ID 1897 13
Return – Element Construction <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID> } </ORDERS> ORDERS 4. Connect into tree ID 1878 3. New element construction 2. Path selection 1. Bring in selected items as before ORDER ID 1897 ORDER NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B 7 SKU P 5 QTY 2 QTY 1 ITEM SKU QTY C 5 2 14
FLWOR Expressions: The let Clause <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID> } </ORDERS> Our previous example can be rewritten using extra variable bindings to improve clarity: <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier : = $order/CARRIER let $id : = data($order/NO) where $carrier = "UPS" return <ID> { $id } </ID> ORDERS } </ORDERS> ID 1878 ID 1897 15
FLWOR Expressions: The order by Clause For this example, we prepare a list of customers sorted by customer name <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name : = $customer/NAME order by $customer/NAME ascending return <CUSTOMER> {$customer/NAME} </CUSTOMER> } </CUSTOMERS> CUSTOMERS CUSTOMER NAME Ann NAME Sue NAME Tom 16
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 17
Type Conversions • In the context of functions and operators, values are automatically extracted from elements: <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { concat("ORDER-", $order/NO) } </ID> } </ORDERS> 18
Type Conversions • $order/NO binds to an element • concat(…) requires a string • Value of the element is automatically extracted • Same happens to lists containing a single element or value 19
Type Conversions All other cases result in errors <ORDERS> { <ID> { concat("ORDER-", doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/NO) } </ID> } </ORDERS> • Path expression above binds to lists • Cannot extract a value from a list of many items! 20
Type Conversions • The data() function can be used to explicitly extract the value: <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { concat("ORDER-", data($order/NO)) } </ID> } </ORDERS> 21
Type Conversions • Automatic extraction of values does not occur in element construction • In that case, the data() function is required: <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID> } </ORDERS> 22
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 23
For-Let-Where-Order By-Return (FLWOR) Let’s take a more in-depth look at the variable bindings in the query developed previously <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier : = $order/CARRIER let $id : = data($order/NO) where $carrier = "UPS" return <ID> { $id } </ID> } </ORDERS> for $var 1 in expr let $var 2 : = expr for and let clauses generate a list of tuples of variable bindings, preserving input order return expr where clause applies a predicate, eliminating some of the tuples order by expr order by clause imposes an order on the remaining tuples return clause is executed for each remaining tuple, generating a list of trees 24
FLWOR Variable Bindings <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier : = $order/CARRIER, $id : = data($order/NO) where $carrier = "UPS" return <ID> { $id } </ID> } </ORDERS> for/let $order where return result $carrier $id ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 CARRIER FEDEX 1861 NO ORDERS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER CARRIER UPS 1878 YES ID 1878 NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 CARRIER UPS 1897 YES ID 1897 ID 1878 ID 1897 25
for vs. let for • Binds node variables iteration for $x in expr – binds $x to each element in the list expr let • Binds collection variables one value let $x : = expr – binds $x to the entire list expr – Useful for common subexpressions and for aggregations 26
for vs. let for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return <result> { $order } </result> Returns: <result> <ORDER>…</ORDER></result> … let $order : = doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return <result> { $order } </result> Returns: <result> <ORDER>…</ORDER> … </result> 27
for vs. let <POPULAR_ITEMS> { for $sku in distinct-values(doc(“co")//ITEM/SKU) let $items : = doc(“co")//ORDER/ITEM[SKU = $sku] let $qty. Total : = sum($items/QTY) where $qty. Total > 1 return <ITEM> { $sku } </ITEM> } </POPULAR_ITEMS> • distinct-values – a function that eliminates duplicate values – can be applied to simple elements and atomic values • sum – a (aggregate) function that returns the sum of integers 28
for vs. let <POPULAR_ITEMS> { for $sku in distinct-values(doc(“co")//ITEM/SKU) let $items : = doc(“co")//ORDER/ITEM[SKU = $sku] let $qty. Total : = sum($items/QTY) where $qty. Total > 1 return <ITEM> { $sku } </ITEM> } </POPULAR_ITEMS> for/let where return $sku $items ITEM C 5 SKU QTY 1 C 5 ITEM $qty. Total 3 YES ITEM C 5 YES ITEM B 7 NO SKU QTY 2 C 5 ITEM B 7 SKU QTY 2 B 7 ITEM 2 P 5 SKU QTY P 5 1 1 result POPULAR_ITEMS ITEM C 5 ITEM B 7 29
for vs. let Find items whose quantity is larger than average: let $avg. Qty : = avg(doc(“co”)//ITEM/QTY) for $item in doc(“co”)//ITEM where $item/QTY > $avg. Qty return $item for/let where return $avg. Qty $items ITEM 1. 5 NO SKU QTY 1 C 5 ITEM 1 1. 5 SKU QTY 2 C 5 ITEM 2 YES 1. 5 SKU QTY 2 B 7 ITEM 2 YES 1. 5 SKU QTY P 5 1 1 NO let $avg. Qty $qty. Total 1. 5 ITEM SKU QTY 2 C 5 ITEM SKU QTY 2 B 7 30
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 31
Joins • Joins are expressed using a FLWOR with two loop variables – two for clauses • A where condition specifies how the loop variables relate 32
Join Example Combine orders… ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B 7 SKU P 5 SKU QTY C 5 1 ORDER QTY 2 SHIPPER … with shipper info … QTY 1 ITEM SKU QTY C 5 2 SHIPPER PICKUP NAME 2 PM FEDEX NAME UPS PICKUP 5 PM ORDERS … to produce order deadlines ORDER ID DEADLINE 1861 2 PM ORDER ID DEADLINE 1878 5 PM ORDER ID DEADLINE 1897 5 PM 33
Join Example Query <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER for $shipper in doc("s")/SHIPPERS/SHIPPER let $id : = data($order/NO) let $time : = data($shipper/PICKUP) where $order/CARRIER = $shipper/NAME return <ORDER> <ID>{$id}</ID> <DEADLINE>{$time}</DEADLINE> </ORDER> } </ORDERS> Uses multiple for statements to generate Cartesian product of tuples Uses where statement to filter Cartesian product 34
Join Conditions for/let $order $shipper $id $time where return SHIPPER NO CARRIER ITEM 1861 FEDEX ORDER NAME PICKUP FEDEX 2 PM SHIPPER 1861 2 PM YES NO CARRIER ITEM 1878 UPS ORDER NAME PICKUP FEDEX 2 PM 1878 2 PM NO NO CARRIER ITEM 1897 UPS ORDER NAME PICKUP 1897 FEDEX 2 PM NO NO CARRIER ITEM 1861 FEDEX ORDER NAME PICKUP UPS 5 PM NO NO CARRIER ITEM 1878 UPS ORDER NAME PICKUP UPS 5 PM NO CARRIER ITEM 1897 UPS NAME PICKUP UPS 5 PM … ORDER … SHIPPER … … SHIPPER 1861 … SHIPPER 1878 5 PM … SHIPPER 1897 5 PM YES ORDER ID 1861 DEADLINE 2 PM ORDER ID 1878 DEADLINE 5 PM ORDER … … ID 1897 DEADLINE 5 PM 35
Condensed Join Table <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER, $shipper in doc("s")/SHIPPERS/SHIPPER let $id : = data($order/NO), $time : = data($shipper/PICKUP) where $order/CARRIER = $shipper/NAME return <ORDER> <ID>{$id}</ID> In future examples, <DEADLINE>{$time}</DEADLINE> non-joined rows are removed, </ORDER> as are join where conditions: } </ORDERS> for/let $order $shipper $id $time return SHIPPER NO CARRIER ITEM 1861 FEDEX ORDER NAME PICKUP FEDEX 2 PM SHIPPER 1861 2 PM ID DEADLINE 1861 2 PM ORDER NO CARRIER ITEM 1878 UPS ORDER NAME PICKUP UPS 5 PM 1878 5 PM ID 1878 NO CARRIER ITEM 1897 UPS NAME PICKUP UPS 5 PM … ORDER … SHIPPER DEADLINE 5 PM ORDER 1897 5 PM … … ID 1897 DEADLINE 5 PM 36
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 37
Nested Queries • Nested queries produce hierarchical results • An outer FLWOR loop contains an inner FLWOR loop • Typically, a where condition in the inner FLWOR specifies how the loops relate 38
Nested Query Example SHIPPER Combine shippers… … with orders … SHIPPER NAME PICKUP FEDEX 2 PM NAME UPS ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS SKU QTY C 5 1 PICKUP 5 PM ORDER SKU B 7 NO CARRIER ITEM 1897 UPS QTY 2 SKU P 5 QTY 1 ITEM SKU QTY C 5 2 SHIPPER_ORDERS … to produce orders for each shipper SHIPPER NAME FEDEX ORDER 1861 SHIPPER NAME ORDER UPS 1878 1897 39
Nested Query <SHIPPER_ORDERS> { for $shipper in doc("s")/SHIPPERS/SHIPPER let $name : = $shipper/NAME return <SHIPPER> { $name } { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) where $name = $order/CARRIER return <ORDER> { $id } </ORDER> } </SHIPPER_ORDERS> • Outer loop binds $shipper and $name variables • For each $shipper, $name pair, inner loop binds $order and $id variables • Inner where clause removes $order, $id pairs that don’t match outer element • Inner loop constructs elements from inner variables • Outer loop constructs elements from outer variables and from elements constructed in inner loop 40
Join Conditions OUTER LOOP $shipper INNER LOOP OUTER LOOP $id where return $name $order ORDER YES 1861 NO CARRIER ITEM 1878 UPS ORDER 1878 NO NO CARRIER ITEM 1897 UPS ORDER NO … NO CARRIER ITEM 1861 FEDEX ORDER SHIPPER NAME FEDEX … NAME PICKUP FEDEX 2 PM NAME FEDEX ORDER 1861 … … NO 1861 NO CARRIER ITEM 1878 UPS ORDER 1878 YES ORDER 1878 NO CARRIER ITEM 1897 UPS YES ORDER 1897 … NO CARRIER ITEM 1861 FEDEX ORDER SHIPPER … NAME PICKUP UPS 5 PM NAME UPS SHIPPER NAME ORDER UPS 1878 1897 … … 41
Condensed Nested Query Table <SHIPPER_ORDERS> { for $shipper in doc("s")/SHIPPERS/SHIPPER let $name : = $shipper/NAME return <SHIPPER> { $name } { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) where $name = $order/CARRIER In future examples, return <ORDER> { $id } </ORDER> } non-matched inner rows </SHIPPER> are removed, as are } </SHIPPER_ORDERS> where conditions: OUTER LOOP $shipper SHIPPER NAME PICKUP UPS 5 PM $name $order ORDER NAME FEDEX NAME UPS NO CARRIER ITEM 1861 FEDEX ORDER 1861 YES ORDER 1861 NO CARRIER ITEM 1878 UPS ORDER 1878 YES ORDER 1878 … SHIPPER OUTER LOOP $id where return … NAME PICKUP FEDEX 2 PM INNER LOOP … … NO CARRIER ITEM 1897 UPS YES ORDER 1897 SHIPPER NAME FEDEX ORDER 1861 SHIPPER NAME ORDER UPS 1878 1897 42
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 43
Boolean Expressions • In this section we examine various types of Boolean expressions that may appear in WHERE clauses 44
Functions in Boolean Expressions <ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO), $lc : = count($order/ITEM) where $lc > 1 return <ID> { $id } </ID> } </ORDER_IDS> for/let $order where return result $id $lc ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 1861 1 NO ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 C 5 1878 1 NO ID 1897 2 YES ID 1897 45
Disjunctions <ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO), $lc : = count($order/ITEM ) where $lc > 1 or $order/CARRIER = "FEDEX" return <ID> { $id } </ID> } </ORDER_IDS> for/let $order where return result $id $lc ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 1861 1 YES ID 1861 ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 C 5 1878 1897 1 2 NO YES ID 1861 ID 1897 46
Existential Quantification <ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) where some $sku in $order/ITEM/SKU satisfies $sku = "C 5" return <ID> { $id } </ID> } </ORDER_IDS> for/let $order where return result $id $sku ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 1861 SKU C 5 YES ID 1861 ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 C 5 1878 1897 SKU B 7 SKU P 5 SKU C 5 NO YES ID 1861 ID 1897 47
Universal Quantification <ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) where every $sku in $order/ITEM/SKU satisfies $sku = "C 5" return <ID> { $id } </ID> } </ORDER_IDS> for/let $order where return result $id $sku ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 1861 SKU C 5 YES ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 C 5 ID 1861 1878 1897 SKU B 7 SKU P 5 SKU C 5 NO ID 1861 NO 48
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 49
Conditionals Example Tree Combine customers … CUSTOMER NAME Ann NAME Tom NAME Sue MEMBER … with member info … NAME Tom MEMBER STATUS GOLD NAME Bob STATUS SILVER MEMBER NAME Sue STATUS GOLD CUSTOMERS … to add MEMBER tag to customer data CUSTOMER NAME Ann MEMBER NO CUSTOMER NAME Tom MEMBER YES CUSTOMER NAME Sue MEMBER YES 50
Conditionals Example Query <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name : = $customer/NAME return <CUSTOMER> {$name} { if (some $member in doc("m")/MEMBERS/MEMBER satisfies $member/NAME = $name) then <MEMBER>YES</MEMBER> else <MEMBER>NO</MEMBER> } </CUSTOMERS> • For each customer, the existential quantification statement checks for the existence of a matching member • If a matching member is found, the MEMBER YES tags are output; otherwise, the MEMBER NO tags are output 51
Conditionals Table <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name : = $customer/NAME return <CUSTOMER> {$name} { if (some $member in doc("m")/MEMBERS/MEMBER satisfies $member/NAME = $name) then <MEMBER>YES</MEMBER> else <MEMBER>NO</MEMBER> } </CUSTOMERS> $customer $name CUSTOMER NAME Ann CUSTOMER NAME Tom CUSTOMER NAME Sue if/then/else return some $member result NAME Ann NAME Tom NAME Sue MEMBER NO MEMBER NAME STATUS Tom GOLD MEMBER NAME STATUS Sue SILVER MEMBER YES CUSTOMER NAME MEMBER Ann NO CUSTOMER NAME MEMBER Tom YES CUSTOMER NAME Sue MEMBER YES 52
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 53
Simple Aggregation <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) let $ic : = count($order/ITEM) return <ORDER> <ID> {$id} </ID> <IC> {$ic} </IC> </ORDER> } </ORDERS> for/let $order return $id $ic ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 ORDER 1861 1 NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 C 5 ID 1861 IC 1 ORDERS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER result ORDER 1878 1 ID 1878 IC 1 ORDER ID IC 1861 1 1878 1 1897 2 ORDER 1897 2 ID 1897 IC 2 54
Conditional Aggregation <ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) let $items : = for $i in $order/ITEM where $i/SKU = "C 5" return $i let $ic : = count($items) return <ORDER> <ID> {$id} </ID> <IC> {$ic} </IC> </ORDER> } </ORDERS> $order ORDER $id $items $ic return where return $i ITEM NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C 5 ORDER 1861 NO CARRIER ITEM 1878 UPS SKU QTY 2 B 7 ORDER 1878 NO CARRIER ITEM 1897 UPS SKU QTY 1 1 P 5 C 5 SKU QTY C 5 1 YES ORDER ITEM SKU QTY C 5 1 1 ITEM 1897 SKU QTY P 5 1 ITEM SKU QTY C 5 1 IC 1 ORDER ITEM SKU QTY B 7 2 ID 1861 NO 0 NO YES ID 1878 IC 0 ORDER ITEM SKU QTY C 5 1 1 ID 1897 IC 1 55
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 56
Missing Data Join Example • We will link CUSTOMER_ORDERS with MEMBERS • There are customers that are not members 57
Missing Data Join Trees Combine customers … CUSTOMER NAME Ann NAME Tom NAME Sue MEMBER … with member info … NAME Tom MEMBER STATUS GOLD NAME Bob STATUS SILVER MEMBER NAME Sue STATUS GOLD CUSTOMERS … to produce Prioritized customers CUSTOMER NAME Ann CUSTOMER NAME Tom PRIORITY GOLD CUSTOMER NAME Sue PRIORITY SILVER 58
Missing Data Join Query <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name : = $customer/NAME let $status : = data($member/STATUS) where $name = $member/NAME return <CUSTOMER> {$name} <PRIORITY>{$status}</PRIORITY> </CUSTOMER> } </CUSTOMERS> 59
Missing Data Join Table <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name : = $customer/NAME let $status : = data($member/STATUS) where $name = $member/NAME return <CUSTOMER> {$name} <PRIORITY>{$status}</PRIORITY> </CUSTOMER> } </CUSTOMERS> for/let/join $customer $member CUSTOMER NAME Ann CUSTOMER return $name $status NAME Ann Result for Ann is missing! MEMBER NAME Tom CUSTOMER NAME STATUS Tom GOLD MEMBER NAME Sue NAME STATUS Sue SILVER CUSTOMER NAME Tom GOLD NAME Sue SILVER NAME PRIORITY Tom GOLD CUSTOMER NAME Sue PRIORITY SILVER 60
Missing Data Join Problem CUSTOMERS Wanted: CUSTOMER NAME Ann CUSTOMER NAME Tom PRIORITY GOLD CUSTOMER NAME Sue PRIORITY SILVER CUSTOMERS Got: CUSTOMER NAME Tom PRIORITY GOLD CUSTOMER NAME Sue PRIORITY SILVER The result we want is analogous to an SQL outer join 61
Missing Data Join Solution Query <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name : = $customer/NAME let $status : = data($member/STATUS) Our join query … where $name = $member/NAME return <CUSTOMER> {$name} <PRIORITY>{$status}</PRIORITY> </CUSTOMER> } </CUSTOMERS> … can be restructured into a nested query: <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name : = $customer/NAME return <CUSTOMER> {$name} { for $member in doc("m")/MEMBERS/MEMBER let $status : = data($member/STATUS) where $name = $member/NAME return <PRIORITY>{$status}</PRIORITY> } </CUSTOMERS> 62
Missing Data Join Solution Table <CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name : = $customer/NAME return <CUSTOMER> {$name} { for $member in doc("m")/MEMBERS/MEMBER let $status : = data($member/STATUS) where $name = $member/NAME return <PRIORITY>{$status}</PRIORITY> } </CUSTOMERS> OUTER LOOP INNER LOOP $customer $name $member CUSTOMER NAME Ann CUSTOMER NAME Tom CUSTOMER NAME Sue OUTER LOOP $status return CUSTOMER NAME Ann MEMBER NAME Tom NAME Sue NAME STATUS Tom GOLD MEMBER GOLD PRIORITY GOLD NAME STATUS Sue SILVER PRIORITY SILVER NAME Ann CUSTOMER NAME PRIORITY Tom GOLD CUSTOMER NAME Sue PRIORITY SILVER 63
Missing Data Joins vs. Nested Queries • In joins, tuples with any missing data are eliminated – equivalent to an SQL natural or inner join • In nested queries, tuples are output in spite of missing data – equivalent to an SQL outer join 64
Nested Query Problem • How to remove tuples that have some missing data • How to force inner join functionality in a nested query 65
Missing Data Nested Query Example • Suppose we want a list, by product, of all items on order – perhaps for pulling the items from stock • For each product, we want bundles, separate quantities for each order • We don’t want to list products with no items on order 66
Missing Data Nested Query Trees PRODUCT Combine products … PRODUCT NAME SKU B 7 Battery SKU C 5 NAME Cable ITEM … with order items … SKU C 5 QTY 1 PRODUCT SKU C 4 ITEM SKU B 7 QTY 2 SKU P 5 NAME Case PRODUCT SKU P 5 NAME Phone ITEM QTY 1 SKU QTY C 5 2 ITEMS_ON_ORDER … to items on order PRODUCT SKU B 7 NAME BUNDLE Battery 2 PRODUCT SKU C 5 NAME BUNDLE Cable 1 2 PRODUCT SKU P 5 NAME BUNDLE Phone 1 67
Missing Data Nested Query <ITEMS_ON_ORDER> { for $p in doc("p")/PRODUCTS/PRODUCT let $sku : = $p/SKU let $name : = $p/NAME return <PRODUCT> {$sku} {$name} { for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty : = data($i/QTY) where $sku = $i/SKU return <BUNDLE> {$qty} </BUNDLE> } </PRODUCT> } </ITEMS_ON_ORDER> 68
Missing Data Nested Query Table OUTER LOOP $p INNER LOOP OUTER LOOP $qty return $sku $name $i PRODUCT SKU NAME B 7 Battery SKU B 7 PRODUCT ITEM NAME SKU Battery B 7 2 BUNDLE 2 SKU QTY C 5 1 ITEM 1 BUNDLE 1 SKU C 5 2 BUNDLE 2 QTY 2 ITEM PRODUCT SKU C 5 NAME Cable PRODUCT SKU NAME C 4 Case PRODUCT SKU P 5 NAME Phone SKU C 4 NAME Case SKU P 5 NAME Phone QTY 2 QTY 1 PRODUCT SKU C 5 NAME BUNDLE Cable 1 2 PRODUCT SKU NAME C 4 Case PRODUCT ITEM SKU P 5 SKU NAME BUNDLE B 7 Battery 2 1 BUNDLE 2 SKU P 5 NAME BUNDLE Phone 2 69
Missing Data Nested Query Problem ITEMS_ON_ORDER Wanted: PRODUCT SKU B 7 PRODUCT NAME BUNDLE Battery 2 SKU C 5 Got: NAME BUNDLE Battery 2 NAME BUNDLE Cable 1 2 SKU P 5 NAME BUNDLE Phone 1 ITEMS_ON_ORDER PRODUCT SKU B 7 PRODUCT SKU C 5 NAME BUNDLE Cable 1 2 PRODUCT SKU C 4 NAME Case SKU P 5 NAME BUNDLE Phone 1 The result we want is analogous to an SQL inner (natural) join 70
Missing Data Nested Query Solution <ITEMS_ON_ORDER> { for $p in doc("p")/PRODUCTS/PRODUCT let $sku : = $p/SKU let $name : = $p/NAME return Our nested query … <PRODUCT> {$sku} {$name} { for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty : = data($i/QTY) where $sku = $i/SKU return <BUNDLE> {$qty} </BUNDLE> } </PRODUCT> } </ITEMS_ON_ORDER> … can be restructured with the inner for loop moved to a variable in the outer loop … <ITEMS_ON_ORDER> { for $p in doc("p")/PRODUCTS/PRODUCT let $sku : = $p/SKU, $name : = $p/NAME let $bundle : = for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty : = data($i/QTY) … and a where clause where $sku = $i/SKU can be added to remove return <BUNDLE> {$qty} </BUNDLE> outer elements with where not(empty($bundle)) no inner elements return <PRODUCT> {$sku} {$name} {$bundle} </PRODUCT> } </ITEMS_ON_ORDER> 71
Missing Data Nested Query Solution Table $p where return $sku $name $bundle $qty return $i PRODUCT SKU NAME B 7 Battery SKU B 7 ITEM NAME SKU Battery B 7 2 BUNDLE 2 SKU QTY C 5 1 ITEM 1 BUNDLE 1 SKU C 5 2 BUNDLE 2 QTY 2 ITEM PRODUCT SKU C 5 NAME Cable PRODUCT SKU NAME C 4 Case PRODUCT SKU P 5 NAME Phone SKU C 4 SKU P 5 QTY 2 NAME Case NAME Phone PRODUCT YES SKU NAME BUNDLE B 7 Battery 2 PRODUCT YES SKU C 5 NAME BUNDLE Cable 1 2 NO ITEM SKU P 5 QTY 1 1 BUNDLE 2 YES PRODUCT SKU P 5 NAME BUNDLE Phone 2 72
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 73
Advanced Example Trees PRODUCT Combine products … … with orders … PRODUCT NAME SKU Battery B 7 SKU C 5 PRODUCT NAME Cable SKU C 4 PRODUCT NAME Case NAME Phone SKU P 5 ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS ITEM SKU B 7 SKU P 5 SKU QTY C 5 2 SKU QTY C 5 1 ORDER QTY 2 QTY 1 PRODUCT_ORDERS … to produce orders for each product PRODUCT SKU B 7 ORDER 1878 PRODUCT SKU C 5 ORDER 1861 1897 PRODUCT SKU C 4 PRODUCT SKU P 5 ORDER 1897 74
Advanced Example Query (: By the way, this is a comment : ) <PRODUCT_ORDERS> { for $product in doc("p")/PRODUCTS/PRODUCT return <PRODUCT> {$product/SKU} { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id : = data($order/NO) where some $sku in $order/ITEM/SKU satisfies $sku = $product/SKU return <ORDER> { $id } </ORDER> } </PRODUCT_ORDERS> For each product (outer for loop), loop through all orders (inner for loop) Where statement filters out orders which don’t contain the product under consideration 75
Advanced Example Exercises • Preparation of the query table (table of variable bindings) is left as an exercise • How can the query be rewritten to – – eliminate products with no orders? add a <no_orders/> tag to products with no orders? sort by SKU? add a total quantity ordered count under each product? 76
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 77
Sequences • Ordered lists of nodes, either element, attribute or text nodes, or a combination thereof • Can be constructed in for/let clauses for $product in doc("p")/PRODUCTS/PRODUCT • Or manually in the return clause for $product in doc(“p")/PRODUCTS/PRODUCT return ( <SKU>{data($product/SKU)}</SKU>, <NAME>{data($product/NAME)}</NAME> ) • Not needed if a parent element constructor is present for $product in doc(“p")/PRODUCTS/PRODUCT return <PRODUCT> <SKU>{data($product/SKU)}</SKU> <NAME>{data($product/NAME)}</NAME> </PRODUCT> 78
Sequences • Concatenation ($seq 1, $seq 2) • Union $seq 1 union $seq 2 $seq 1 | $seq 2 – Example: for $product in doc(“p")/PRODUCTS/PRODUCT union doc(“co")//ITEM return $product • Intersection $seq 1 intersect $seq 2 • Difference $seq 1 except $seq 2 • Union, Intersection and Difference remove duplicates 79
Topics • • • For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 80
User-Defined Functions • Useful for recursion declare function local: depth($e as element()) as xs: integer { if (empty($e/*)) then 1 else max( for $child in $e/* return local: depth($child) + 1 ) }; Query Prolog for $a in doc(“co")/CUSTOMER_ORDERS return local: depth($a) • “local” prefix is reserved for user-defined functions 81
Global Variables • Also declared in the query prolog declare variable $threshold : = 2; for $order in doc(“co")//ORDER let $total. Qty : = sum($order//QTY) where $total. Qty > $threshold return $order • Can be used to parameterize your queries 82
XQuery and XML Schemas • XML Schemas can be used within XQuery to validate: – Input documents – Query Result import schema namespace in="http: //www. cse. buffalo. edu/in" at “in. xsd"; import schema namespace out="http: //www. cse. buffalo. edu/out" at “out. xsd"; validate{ <out: CUSTOMER_ORDERS> { for $custs in doc(“co”)/in: CUSTOMER_ORDERS/* return $custs } </out: CUSTOMER_ORDERS> } 83
References • XQuery Tutorial – Yannis Papakonstantinou – http: //www. db. ucsd. edu/people/yannis/XQuery. Tutorial. htm • W 3 C's XQuery homepage – http: //www. w 3. org/XML/Query/ • XML School – http: //www. w 3 schools. com 84
- Xquery injection
- Xquery vs xslt
- Xquery distinct
- Data integration with xml and semantic web technologies
- Sql query for xml
- 636 dewey decimal
- Maradmin 636/20
- Iterative query vs recursive query
- Query tree and query graph
- Query tree and query graph
- Relational query languages in dbms
- Formal relational query languages
- Find the id name dept_name
- Formal relational query languages
- Relational query languages
- Adam doupe cse 340
- Cse 340 principles of programming languages
- Etl in data cleaning and preprocessing stands for
- Mashups meaning
- Data mashups and gis are data integration technologies.
- Entity identification problem in data integration
- Three dimensions of corporate strategy
- Backwards intergration
- Integration
- Extracting data from xml
- Xml data mining
- Query tools in data mining
- Query decomposition and data localization
- Research problems in data warehousing
- Enrico franconi
- Polybase data virtualization
- Query driven approach in data warehouse
- Attribute data query in gis
- Starnet query model in data warehouse
- List the primitives that specify a data mining task.
- Data mining languages and system architecture
- Data mining
- Cse 572
- Voice data integration
- Data acquisition and integration
- Continuous integration data warehouse
- Vertical movie
- Soa data integration
- Open source data integration software
- Open studio for data integration
- Data integration in the life sciences
- Data collection integration
- Data acquisition and integration
- Data integration web services
- Virtual data integration architecture
- String matching in data integration
- Core data integration
- Forrester data virtualization
- Data warehouse integration services
- Data integration statistics
- Click test
- Numerical integration of discrete data
- Sap pdmi
- Data integration problems approaches and perspectives
- Integration layer in data warehouse
- Attributesynonym
- Heterogeneous data integration
- Sas information delivery portal
- "data integration"
- Zig web framework
- Xml user interface language
- парсер в браузере
- Xray xml editor
- Dublin core xml
- Java soap xml 파싱
- Java xml datei einlesen
- Html stand for
- Xml stands for *
- Ncoaug
- Syntax xml
- Advantages of xml
- Textml server
- Specifications and constraints
- Xml meaning
- Xml vs xbrl
- Xml based web services
- Vtd-xml
- Sas xml import
- Vi format xml
- Kml to xml