Honey I Shrunk the XQuery An XML Algebra
Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke A. Rundensteiner WIDM 2002 DSRG, Worcester Polytechnic Institute 1
XML and Relational The look and feel of an XML query system combined with the maturity and technology support of RDB + XML Flexible and powerful way to: 1) Represent data on the web 2) Exchange data between applications WIDM 2002 Relational Database 1) Widely used to store business data 2) Efficient, reliable, secure 3) Provides standard querying (SQL) DSRG, Worcester Polytechnic Institute 2
Architecture User XAT User XQuery View XAT User Query Results in XML XAT Merger XAT Decorrelator XAT Generator Virtual XML XML Document User XAT XAT Optimizer XAT SQL Generator XAT Executor Tuples XAT View XQuery SQL RDBMS XAT: XML Algebra Tree WIDM 2002 Virtual XMLXML Document XMLDocument DSRG, Worcester Polytechnic Institute 3
GOAL: XQuery level optimization WIDM 2002 DSRG, Worcester Polytechnic Institute 4
Running Example <prices> <book> <title>TCP/IP Illustrated</title> <price>65. 95</price> </book> <title>Data on the Web</title> <price>34. 95</price> </book> </prices> <result> FOR $t IN document(“prices. xml”)/book/title RETURN $t </result> <prices> FOR $book IN document(“dxv. xml”)/book/row $prices IN document(“dxv. xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> WIDM 2002 <results> <title>TCP/IP Illustrated</title> <title>Data on the Web</title> </results> <dxv> <prices> <book> <row> <bid>001</bid> <price>65. 95</price> <title>TCP/IP Illustrated</title> </row> <row> <bid>002</bid> <price>34. 95</price> <title>Data on the Web</title> </row> </prices> </book> </dxv> Bid Title Bid Price 001 TCP/IP Illustrated 001 65. 95 002 Data on the Web 002 34. 95 DSRG, Worcester Polytechnic Institute 5
XAT View XAT Merger View XAT Decorrelator View XQuery User XAT XAT Generator User XQuery XAT XAT Optimizer SQL Generator XAT Executor User XML Algebra Tree (XAT) User Query 1: <result> FOR $t IN document(“prices. xml”)/book/title RETURN $t </result> 2: T<results>$t</result>col 3 3: 6: 7: WIDM 2002 col 3 Agg R 0, book/title$t S”prices. xml”R 0 DSRG, Worcester Polytechnic Institute 6
User XAT XAT View XAT Merger View XAT View XQuery XAT Decorrelator XAT Generator User XQuery XAT XAT Optimizer XAT SQL Generator XAT Executor View XML Algebra Tree (XAT) <prices> FOR $book IN document(“dxv. xml”)/book/row $prices IN document(“dxv. xml”)/prices/row WHERE $book/bid = $prices/bid RETURN <book> $book/title, $prices/price </book> </prices> 22: 11: T<prices>col 5</prices>col 4 12: Agg WIDM 2002 View Query 26: 27: $book, bidcol 6 28: $prices, bidcol 7 31: T<book> [col 10][col 12] </book>col 5 23: $book, titlecol 10 14: 25: col 6=col 7 $prices, pricecol 12 15: R 1, /book/row$book S“dxv. xml” R 1 DSRG, Worcester Polytechnic Institute 20: 21: R 3, /prices/row$prices S“dxv. xml” R 3 7
User XAT XAT View XAT Merger View XAT View XQuery XAT Decorrelator XAT Generator User XQuery XAT Optimizer XAT SQL Generator XAT Executor Merged XML Algebra Tree (XAT) User Query View Query 1: 2: col 3 26: 11: T<prices>col 5</prices>col 4 T<results>$t</result>col 3 3: 6: 7: Agg R 0, book/title$t col 4 R 0 12: 23: Agg T<book> [col 10][col 12] </book>col 5 WIDM 2002 27: $book, bidcol 6 28: $prices, bidcol 7 31: $book, titlecol 10 14: 25: col 6=col 7 $prices, pricecol 12 15: R 1, /book/row$book S“dxv. xml” R 1 DSRG, Worcester Polytechnic Institute 20: 21: R 3, /prices/row$prices S“dxv. xml” R 3 8
Outline n XAT Optimization: n n n XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary WIDM 2002 DSRG, Worcester Polytechnic Institute 9
User XAT XAT View XAT Merger View XAT View XQuery XAT Decorrelator XAT Generator User XQuery XAT Optimizer XAT SQL Generator XAT Executor XAT Rewrite n n Query Optimization at Logic Level. Goal: n n n Redundancy Elimination. Computation Pushdown. Technique: n n Equivalence Rewrite Rules. Heuristics: n n n WIDM 2002 Pushdown Navigates Remove Construction of Intermediate Result Combine Multiple Operators. DSRG, Worcester Polytechnic Institute 10
Before Navigation Pushdown User Query 1: 2: View Query col 3 26: T<results>$t</result>col 3 3: 6: Agg R 0, book/title$t col 4 R 0 7: 22: 23: T<book> [col 10][col 12] </book>col 5 12: Agg WIDM 2002 27: $book, bidcol 6 28: $prices, bidcol 7 31: $book, titlecol 10 11: T<prices>col 5</prices>col 4 14: 25: col 6=col 7 $prices, pricecol 12 15: R 1, /book/row$book S“dxv. xml” R 1 DSRG, Worcester Polytechnic Institute 20: 21: R 3, /prices/row$prices S“dxv. xml” R 3 11
After Navigation Pushdown User Query View Query 22: 1: 2: col 3 26: col 6=col 7 31: T<results>$t</result>col 3 3: Agg R 0, book/title$t 6: 11: T<book> [col 10][col 12] </book>col 5 T<prices>col 5</prices>R 0 23: $book, titlecol 10 27: $book, bidcol 6 14: 12: R 1, /book/row$book 25: 28: $prices, pricecol 12 $prices, bidcol 7 R 3, /prices/row$prices 20: Agg 15: WIDM 2002 S“dxv. xml” R 1 DSRG, Worcester Polytechnic Institute 21: S“dxv. xml” R 3 12
After Tagger Cancel Out User Query 1: View Query col 3 31: 2: JOIN col 6=col 7 T<results>$t</result>col 3 3: Agg 23: $book, title$t 27: $book, bidcol 6 14: R 1, /book/row$book 15: WIDM 2002 25: 28: DSRG, Worcester Polytechnic Institute $prices, bidcol 7 R 3, /prices/row$prices 20: S“dxv. xml” R 1 $prices, pricecol 12 21: S“dxv. xml” R 3 13
Outline n XAT Optimization n n XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary WIDM 2002 DSRG, Worcester Polytechnic Institute 14
XAT View XAT Merger View XAT Decorrelator View XQuery User XAT XAT Generator User XQuery XAT Optimizer XAT SQL Generator XAT Executor XAT Cleanup n Why: n n SQL engine cannot reduce redundancy in XQuery. How: n Data Redundancy by Schema Cleanup n n n Tree Redundancy by Unused Operator Cutting n n n WIDM 2002 Each operator produced, consumed and modified some columns. Minimum schema is then computed. Cutting matrix generation. Required columns analysis. Operator cutting. DSRG, Worcester Polytechnic Institute 15
XAT Operator Properties n Produced n n n Consumed n n n Desc: New column generated by operator. Example: , S, T Desc: Columns required by operator. Example: , Modified n n WIDM 2002 Desc: Columns modified by operator. Example: , , DSRG, Worcester Polytechnic Institute 16
Schema Computation Node Parent 1 1: 2: T<results>$t</result>col 3 Agg 31: col 6=col 7 23: $book, title$t 27: $book, bidcol 6 25: $book $prices, pricecol 12 28: 20: R 1, /book/row $prices, bidcol 7 R 3, /prices/row$prices 21: 15: Consumed Old Schema {} {col 3} col 3 3: 14: Produced S“dxv. xml” R 1 WIDM 2002 S“dxv. xml” R 3 2 1 {col 3} {$t} {col 3, R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} 3 2 {} {} {R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} 31 3 {} {col 6, col 7} {R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} 23 31 {$t} {$book} {R 1, $book, col 6, $t} 27 23 {col 6} {$book} {R 1, $book, col 6} 14 27 {$book} {R 1, $book} 15 14 {R 1} {} {R 1} 25 31 {col 12} {$prices} {R 3, $prices, col 7, col 12} 28 25 {col 7} {$prices} {R 3, $prices, col 7} 20 28 {$prices} {R 3, $prices} 21 20 {R 3} {} DSRG, Worcester Polytechnic Institute {R 3} 17
Intuition: Don’t keep anything that’s not used later. Schema Computation # 1: 2: 1 col 3 T<results>$t</result> 3: col 3 Agg 31: col 6=col 7 23: 27: $book, title $t $book, bidcol 6 25: 14: $prices, pricecol 12 28: 20: R 1, /book/row$book col 3 $t col 6 col 7 $book R 1 col 12 $prices R 3 2 1 3 2 31* 3 23 31 27 23 14 27 15 14 25 31 28 25 20 28 21 20 P R 3, /prices/row$prices S“dxv. xml” R 1 WIDM 2002 S“dxv. xml” R 3 New Schema {col 3} C {$t} C P P C {col 6, $t} C {$book, col 6} P C {$book} P {R 1} P C {col 7, col 12} C {$prices, col 7} $prices, bidcol 7 21: 15: Parent() P P C {$prices} P {R 3} *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. DSRG, Worcester Polytechnic Institute 18
Schema Cleanup Result Node Original Schema Minimum Schema 1 {col 3, R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} {col 3} 2 {col 3, R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} {col 3} 3 {R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} {$t} 31 {R 1, $book, col 6, $t, R 3, $prices, col 7, col 12} {$t} 23 {R 1, $book, col 6, $t} {col 6, $t} 27 {R 1, $book, col 6} {$book, col 6} 14 {R 1, $book} {$book} 15 {R 1} 25 {R 3, $prices, col 7, col 12} {col 7, col 12} 28 {R 3, $prices, col 7} {$prices, col 7} 20 {R 3, $prices} {$prices} 21 {R 3} WIDM 2002 DSRG, Worcester Polytechnic Institute 19
XAT Cleanup n Schema Cleanup n n n Each operator produced, consumed and modified some columns. Minimum schema is then computed. Unused Operator Cutting n n n WIDM 2002 Cutting matrix generation. Required columns analysis. Operator cutting. DSRG, Worcester Polytechnic Institute 20
Cutting Matrix n Purpose: n n Equations: n n n Get rid of the unused operators. Propagation of modified Propagation of required Identify cuttable node. WIDM 2002 DSRG, Worcester Polytechnic Institute 21
Matrix Computation 1: col 3 3: 31: 27: Agg JOIN col 6=col 7 $book, title $t $book, bidcol 6 25: 20: 14: R 1, /book/row$book $prices, pricecol 12 28: $prices, bidcol 7 R 3, /prices/row$prices col 3 $t col 6 col 7 $book R 1 col 12 $prices R 3 - - - - C C P C Cut? C 2 1 P C 3 2 - - 31* 3 23 31 27 23 14 27 15 14 25 31 28 25 20 28 21 20 P C P C P C P *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 21: 15: Parent() 1 T<results>$t</result>col 3 2: 23: # S“dxv. xml” R 3 S“dxv. xml” R 1 WIDM 2002 DSRG, Worcester Polytechnic Institute 22
Intuition: Give me only the required columns in order to get the final result. Matrix Computation (Cont. 1) 1: col 3 # T<results>$t</result> 2: 3: 31: 23: 27: JOIN $book, title col 6=col 7 $t $book, bidcol 6 25: R 1, /book/row$book $prices, pricecol 12 28: $prices, bidcol 7 R 3, /prices/row$prices col 3 $t R R 2 1 P C 3 2 - M 31* 3 23 31 27 23 14 27 15 14 25 31 28 25 20 28 21 20 col 6 col 7 - - C C P $book R 1 R R - - col 12 $prices R 3 - - - P C Cut? C P C P *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 21: 15: Parent() 1 Agg 20: 14: col 3 S“dxv. xml” R 1 WIDM 2002 DSRG, Worcester Polytechnic Institute 23
Matrix Computation (Cont. 2) 1: col 3 3: 31: 27: Agg JOIN $book, title col 6=col 7 $t $book, bidcol 6 25: 20: 14: R 1, /book/row$book $prices, pricecol 12 28: $prices, bidcol 7 R 3, /prices/row$prices col 3 $t R R 2 1 P C 3 2 - M 31* 3 23 31 27 23 14 27 15 14 25 31 28 25 20 28 21 20 col 6 col 7 - - C C P $book R 1 R R - - col 12 $prices R 3 - - - Cut? X C P P P C X P X *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. 21: 15: Parent() 1 T<results>$t</result>col 3 2: 23: # S“dxv. xml” R 3 S“dxv. xml” R 1 WIDM 2002 DSRG, Worcester Polytechnic Institute 24
XAT after Cutting 1: col 3 3: 31: Agg JOIN 23: $book, title$t 27: $book, bidcol 6 14: R 1, /book/row$book Agg 3: col 6=col 7 25: $prices, pricecol 12 28: $prices, bidcol 7 R 3, /prices/row$prices 21: 15: T<results>$t</result>col 3 2: 20: col 3 1: T<results>$t</result>col 3 2: Reduced To 23: 14: 15: $book, title$t R 1, /book/row$book S“dxv. xml” R 1 S“dxv. xml” R 3 S“dxv. xml” R 1 WIDM 2002 DSRG, Worcester Polytechnic Institute 25
User XAT View XAT XAT Merger View XQuery XAT Decorrelator XAT Generator User XQuery XAT Optimizer XAT SQL Generator XAT Executor SQL Generated 1: 2: T<results>$t</result> 3: 31: 23: 27: SELECT col 3 FROM Agg WHERE JOIN $book, title$t $book, bidcol 6 25: col 3 T<results>$t</result>col 3 2: 3: Agg 28: R 1, /book/row$book $prices, bidcol 7 $book, title$t 23: $prices, pricecol 12 14: R 1, /book/row$book S“dxv. xml” R 1 15: R 3, /prices/row$prices 21: 15: 1: col 6=col 7 20: 14: “$book”. title as “$t”, “$book”. bid as “col 6”, “$prices”. price as “col 12”, “$prices”. bid as “col 7” book “$book”, prices “$prices” “col 6”=“col 7” S“dxv. xml” R 3 SELECT FROM “$book”. title as “$t”, book “$book”, S“dxv. xml” R 1 WIDM 2002 DSRG, Worcester Polytechnic Institute 26
Outline n XAT Optimization n n XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary WIDM 2002 DSRG, Worcester Polytechnic Institute 27
Preliminary Evaluation n Experiment Setup n n n Data Setup n n n XQuery over Kweelt Parser PIII 800 256 MB, Win 2 k Pro. Synthetic Data Synthetic Queries Query Execution n WIDM 2002 Native XML Engine. DSRG, Worcester Polytechnic Institute 28
Performance Gain in Execution WIDM 2002 DSRG, Worcester Polytechnic Institute 29
Query Engine Overhead XAT View XAT Merger View XAT Decorrelator XAT Generator View XQuery User XAT User XQuery XAT Rewrite XAT Optimizer XAT SQL Generator XAT Executor XAT Cleanup Total: 32, 522 ms WIDM 2002 DSRG, Worcester Polytechnic Institute 30
Outline n XAT Optimization n n XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary WIDM 2002 DSRG, Worcester Polytechnic Institute 31
Related Work n Rainbow: n Optimize on XAT. (static analysis) n Algebra level rewriting. SQL Optimization n n Algebra based optimization. Static analysis. XQuery by Views: Optimize in SQL. n XPERANTO[VLDBJ 2000]: XQGM vs. XAT n n Silk. Route[IEEE 2001(24: 2)]: n n Extension by UDFs for XML features. Generate SQL Efficiently. AGORA[VLDB 2000]: n WIDM 2002 Syntax level rewriting. DSRG, Worcester Polytechnic Institute 32
Summary n n n Efficient XQuery Processing XML Algebra Tree (XAT) XAT Optimization: n n Rewrite by using equivalent rules Cleanup n n n Schema cleanup Operator cutting Prototype system implementation. WIDM 2002 DSRG, Worcester Polytechnic Institute 33
Questions? (Futures!) http: //davis. wpi. edu/dsrg/rainbow https: //sourceforge. net/projects/rainbow-engine/ Special Thanks: Brian Murphy, Luping Ding, DSRG group. WIDM 2002 DSRG, Worcester Polytechnic Institute 34
User XAT View XAT Merger WIDM 2002 View XAT Decorrelator View XQuery User XAT Generator User XQuery XAT Optimizer XAT DSRG, Worcester Polytechnic Institute SQL Generator XAT Executor 35
Schema Computation Node Parent Produced Consumed 1: col 3 1 2: T<results>$t</result>col 3 3: Agg 31: col 6=col 7 23: $book, title$t 25: $prices, pricecol 12 28: $prices, bid col 7 27: $book, bidcol 6 14: R 1, /book/row $book 20: R 3, /prices/row$prices S“dxv. xml” R 3 21: 15: S“dxv. xml” R 1 WIDM 2002 Minimum Schema {} {col 3} 2 1 {col 3} {$t} {col 3} 3 2 {} {} {$t} 31 3 {} {col 6, col 7} {$t} 23 31 {$t} {$book} {col 6, $t} 27 23 {col 6} {$book, col 6} 14 27 {$book} {R 1} {$book} 15 14 {R 1} {} {R 1} 25 31 {col 12} {$prices} {col 7, col 12} 28 25 {col 7} {$prices, col 7} 20 28 {$prices} {R 3} {$prices} 21 20 {R 3} {} DSRG, Worcester Polytechnic Institute 36
After Tagger Cancel Out User Query 1: 2: View Query T<results>$t</result> 3: 26: col 3 Agg 31: col 3 23: $book, title$t 27: $book, bidcol 6 14: R 1, /book/row$book 15: WIDM 2002 col 6=col 7 25: 28: DSRG, Worcester Polytechnic Institute $prices, bidcol 7 R 3, /prices/row$prices 20: S“dxv. xml” R 1 $prices, pricecol 12 21: S“dxv. xml” R 3 37
- Slides: 37