XQuery Implementation in a Relational Database System Shankar
XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan SQL Server Microsoft Corp VLDB 2005 - Sep 1 S. Pal et al.
Overview o Background n n n o Main topic – XQuery compilation n o XML Support in SQL Server 2005 Ord. Path labeling of XML nodes XML indexes – PATH, VALUE, PROPERTY Architecture XML operators Mapping XML operators to relational+ ops Conclusions VLDB 2005 - Sep 1 S. Pal et al. 2
Background XML Support in SQL Server 2005 Create table DOCS ( ID int primary key, XDOC xml) o o XML stored in an internal, binary form (‘blob’) Optionally typed by a collection of XML schemas n o 3 of 5 methods on XML data type: n n n o o Used for storage and query optimizations query(): returns XML type value(): returns scalar value exist(): checks conditions on XML nodes XML indexing More information at http: //msdn. microsoft. com/xml VLDB 2005 - Sep 1 S. Pal et al. 3
Background XQuery embedded in SQL o Retrieve section titles from <book> wrapped in new <topic> elements: SELECT ID, XDOC. query(' for $s in /BOOK/SECTION return <topic> {data($s/TITLE)} </topic> ') FROM DOCS VLDB 2005 - Sep 1 S. Pal et al. 4
Background XQuery – supported features o o o o XQuery clauses “for”, “where”, “return” and “order by” XPath axes – child, descendant, parent, attribute, self and descendant-or-self Functions – numeric, string, Boolean, nodes, context, sequences, aggregate, constructor, data accessor SQL Server extension functions to access SQL variable and column data within XQuery Numeric operators (+, -, *, div, mod) Value comparison operators (eq, ne, lt, gt, le, ge) General comparison operators (=, !=, <, >, <=, >=) VLDB 2005 - Sep 1 S. Pal et al. 5
Background [SIGMOD 04] ORDPATH Label of Nodes node 1 is ancestor of node 2 ORDPATH (node 1) is prefix of ORDPATH (node 2) BOOK 1 @ISBN 1. 1 Section 1. 3 Title 1. 3. 1 Figure 1. 3. 3 Section 1. 5 Title 1. 5. 1 Figure 1. 5. 3 node 1 precedes node 2 in document order ORDPATH(1. 3) < Descendant_Limit (1. 3) = 1. 4 ORDPATH (node 1) ≤< id ORDPATH (node 2) VLDB 2005 - Sep 1 S. Pal et al. 6
Background [VLDB 2004] Indexing XML column o Primary XML index on an XML column n n o Creates B+tree on data model content of the XML nodes Adds column Path_ID for the reversed, encoded path from each XML node to root of XML tree Ord. Path labeling schema is used for XML nodes n n Relative order of nodes Document hierarchy VLDB 2005 - Sep 1 S. Pal et al. 7
Background XML example INSERT INTO my. Table VALUES (7, ‘<Book xmlns="myns" ISBN = "1 -55860 -3612"> <Section> <Title>Bad Bugs</Title> </Section> <Title> Tree frogs </Title> <Figure>…</Figure> </Section> </Book>’) VLDB 2005 - Sep 1 S. Pal et al. 8
Background Primary XML Index Entries ID ORDPATH TAG NODETYPE VALUE PATH_ID 7 1 1 (Book) 10 (ns: b. T) NULL #1 7 1. 1 2 (ISBN) 2 (xs: string) '1 -55860 -…' #2#1 7 1. 3 3 (Section) 11 (ns: s. T) 7 1. 3. 1 4 (Title) 2 (xs: string) 'Bad Bugs' #4#3#1 7 1. 3. 3 5 (Figure) 12 (ns: f. T) NULL #5#3#1 7 1. 5 3 (Section) 11 (ns: s. T) NULL #3#1 7 1. 5. 1 4 (Title) 2 (xs: string) 'Tree frogs' #4#3#1 7 1. 5. 3 5 (Figure) 12 (ns: f. T) #5#3#1 NULL #3#1 - Encoding of tags & types stored in system meta-data Clustering key - Additional details not shown VLDB 2005 - Sep 1 S. Pal et al. 9
Background Secondary XML indexes o To speed up different classes of commonly occurring queries PATH path-based queries PATH_ID, VALUE, ID, ORDPATH VALUE value-based queries VALUE, PATH_ID, ORDPATH PROPERTY Object properties o ID, PATH_ID, VALUE, ORDPATH Statistics created on key columns of the primary and secondary XML indexes n Used for cost-based selection of secondary XML indexes VLDB 2005 - Sep 1 S. Pal et al. 10
Background Handling Types o If XML column is typed n o Untyped XML n n o Values are stored in XML blob and XML indexes with appropriate typing Values are stored as strings Convert to appropriate types for operations SQL typed values stored in primary XML index n n n Most SQL types are compatible with XQuery types (integer) Value comparisons on XML index columns suffice Some types (e. g. xs: datetime) are stored in internal format and processed specially VLDB 2005 - Sep 1 S. Pal et al. 11
XQuery Processing Architecture XQuery expression o XQuery Compiler: n n XQuery Compiler n n XML algebra tree (Xml. Op ops) Parses XQuery expr Checks static type correctness Type annotations Applies static optimiztns o o XML Operator Mapper n Relational Operator Tree (relational+ operators) n n Reln Query Processor VLDB 2005 - Sep 1 S. Pal et al. Path collapsing Rewrites using XML schemas Recursively traverses XML algebra tree Converts each Xml. Op to reln+ operator sub-tree Mapping depends upon existence of primary XML index 12
Examples of XML Operators Xml. Op_Select In: list of items, condition Out: items satisfying condition Xml. Op_Path In: simple paths, no predicates Opt: path context to collapse paths Out: eligible XML nodes Xml. Op_Apply In: two item lists Out: one item list Variable binding in “for” expression Xml. Op_Construct In: sub-nodes for element construction, otherwise value Out: constructed node VLDB 2005 - Sep 1 S. Pal et al. 13
XML Operator Mapping – Overview PATH Index XQUERY PK Ord. Path 1 PK XML 1 20 35 1 Primary XML Index 1 VALUE Index 1 20 20 20 35 PROPERTY Index 35 REL+ tree VLDB 2005 - Sep 1 Special handling for SELECT * | XDOC S. Pal et al. 14
New operators o Some produce N rows from M (≠ N) rows n n o Some are for efficiency n n n o XML_Reader – streaming, pull-model XML parser XML_Serializer – to serialize query result as XML Contains – to evaluate XQuery contains() Text. Add – to evaluate the XQuery function string() Data – to evaluate XQuery data() function Some are for specific needs n Check – validate XML during insertion or modification VLDB 2005 - Sep 1 S. Pal et al. 15
XML Operator Mapping o Following categories: n n n Mapping of XPath expressions Mapping of XQuery built-in functions VLDB 2005 - Sep 1 S. Pal et al. 16
Non-indexed XML, Full Path XML operator tree: o Xml. Op_Path PATH = “/BOOK/SECTION” Rel+ operator tree: XML_Serialize o XML_Reader (XDOC, “/BOOK/SECTION”) VLDB 2005 - Sep 1 S. Pal et al. XML_Reader produces subtrees of <SECTION> n Node table rows n Contains Ord. Path n No PK or PATH_ID XML_Serialize reassembles those row into XML data type n To output result 18
Sample query execution using Primary XML Index ID ORDPATH TAG NODETYPE VALUE PATHID 7 1 1 (Book) 10 (ns: b. T) NULL #1 7 1. 1 2 (ISBN) 2 (xs: string) '1 -55860 -…' #2#1 7 1. 3 3 (Section) 11 (ns: s. T) #3#1 7 1. 3. 1 4 (Title) 2 (xs: string) 'Bad Bugs' #4#3#1 7 1. 3. 3 5 (Figure) 12 (ns: f. T) NULL #5#3#1 7 1. 5 3 (Section) 11 (ns: s. T) NULL #3#1 7 1. 5. 1 4 (Title) 2 (xs: string) 'Tree frogs' #4#3#1 7 1. 5. 3 5 (Figure) 12 (ns: f. T) #5#3#1 Clustering key VLDB 2005 - Sep 1 NULL • /Book/Section #3#1 (by XML Op Mapper) S. Pal et al. 20
Indexed XML, Full Path XML_Serialize Apply Select ($b) o Assemble Subtree o Select Xml. Op_Path mapped to SELECT GET(PXI) – rows from primary XML index n GET (PXI) GET $b. Ord. P (PXI) ≤ Ord. P< DL($b) o Match PATH_ID Not shown: n JOIN with base table on PK Path_ID=#SECTION#BOOK VLDB 2005 - Sep 1 S. Pal et al. 21
XML index – PATH_ID VALUE ID ORDPATH #1 NULL 7 1 #2#1 '1 -55860 -…' 7 1. 1 #3#1 NULL 7 1. 3 #3#1 NULL 7 1. 5 #4#3#1 'Bad Bugs' 7 1. 3. 1 #4#3#1 'Tree frogs' 7 1. 5. 1 #5#3#1 NULL 7 1. 3. 3 #5#3#1 NULL 7 o Speeds up path evaluations o Example – /Book/Section #3#1 VLDB 2005 - Sep 1 S. Pal et al. 1. 5. 3 22
Indexed XML, Imprecise Paths XML_Serialize Apply Select ($s) Assemble subtree of <TITLE> /BOOK/SECTION// TITLE o Matched using LIKE operator on Path_ID GET (PXI) Path_ID LIKE #TITLE%#SECTION#BOOK VLDB 2005 - Sep 1 S. Pal et al. 23
Predicate Evaluation XML_Serialize /BOOK[@ISBN = “ 12”] o Search value compared Apply with VALUE column in PXI Apply Assemble o Collapsed path subtree of /BOOK/@ISBN <BOOK> n Induce index seeks Select ($b) Select n Reduce intermediate result size Path_ID=#@I o Parent check – Par($b) GET (PXI) Path_ID= #BOOK VLDB 2005 - Sep 1 SBN#BOOK & VALUE=“ 12” & Par($b) n o S. Pal et al. Using Ord. Path Value conversion might be needed 24
Ordinal Predicate o o /BOOK[n] Adds ranking column to the rows for <BOOK> elements n o Retrieves the nth <BOOK> node Special optimizations n n n [1] TOP 1 ascending [last()] TOP 1 descending Avoids sorting when input is sorted o Example – in XML_Serializer VLDB 2005 - Sep 1 S. Pal et al. 25
Error handling o Static type errors at compilation time n Raises static type errors if an expression could fail at runtime due to type safety violation o o o n o Addition of string to integer Querying non-existent node name in typed XML Non-singleton in “eq” Some can be fixed using explicit cast or ordinal specification Dynamic error converted to empty sequence n Yields correct result in predicates without negations VLDB 2005 - Sep 1 S. Pal et al. 26
“for” Iterator XML_Serialize for $s in /BOOK//SECTION where $s/@num >= 3 return $s/TITLE Apply o Assemble Apply ($s) <SECTION> Select Path_ID LIKE #TITLE#SECTION% Exists #BOOK & Par($s) Select ($s) GET (PXI) Select GET(PXI) Path_ID LIKE #SECTION%#BOOK VLDB 2005 - Sep 1 XML op for “for” is Xml. Op_Apply n n n o Path_ID LIKE #@num#SEC% #BK & VALUE >= 3 & Par($s) S. Pal et al. Maps to APPLY Binds $s and iterates over <SECTION> Determines its <TITLE> children Nested “for” and “for” with multiple bindings turn into nested APPLY n Each APPLY binds to a different variable 27
XQuery “order by” and “where” o Order by: n n n o Sorts rows based on order-by expression Adds a ranking column to these rows Ranking column converted into Ord. Path values o Yield the new order of the rows o Fits rest of query processing framework Where n n Becomes SELECT on input sequence Filters rows satisfying specified condition VLDB 2005 - Sep 1 S. Pal et al. 28
XQuery “return” o Return nodes sequence in document order n n o Use Ord. Path values and XML_Serialize operator New element and sequence constructions n Merge constructed and existing nodes into a single sequence (SWITCH_UNION) VLDB 2005 - Sep 1 S. Pal et al. 29
XQuery Functions & Operators o Built-in fn and op are mapped to relational fn and op if possible n o fn: count() Additional support for XQuery types, functions and operators that cannot be mapped directly n Intrinsics VLDB 2005 - Sep 1 S. Pal et al. 30
Optimizations o Exploiting Ordered Sets n n o Sorting information (Ord. Path) made available to further relational operators XML_Serialize is an example Using static type information n n Eliminates CONVERT() in operations Allows range scan on VALUE index VLDB 2005 - Sep 1 S. Pal et al. 31
Conclusions o Built-up infrastructure for query processing framework n n o o o Other XQuery features (such as “let” and typeswitch) can be implemented Data modification language o Fits into relational query processing framework XQuery features can be implemented using rel++ operators Optimizations pose the biggest challenges More cost-based optimizations can be done n n Enhanced costing model (e. g. choice of PXI) Matching materialized views VLDB 2005 - Sep 1 S. Pal et al. 32
Thank you! VLDB 2005 - Sep 1 S. Pal et al. 33
- Slides: 31