Recursive XML Schemas Recursive XML Queries and Relational

  • Slides: 34
Download presentation
Recursive XML Schemas, Recursive XML Queries, and Relational Storage: XML-to-SQL Query Translation Krishnamurthy, R.

Recursive XML Schemas, Recursive XML Queries, and Relational Storage: XML-to-SQL Query Translation Krishnamurthy, R. , Chakaravarthy, V. T. , Kaushik, R. , & Naughton, J. F. (2004) Proceedings of the 20 th International Conference on Data Engineering (ICDE’ 04) Presenter: Jochen Stoesser mailto: jstoesse@connect. carleton. ca April 5, 2005 Advanced Database Systems, Jochen Stoesser

Motivation (1) • General topic: XML-to-SQL query translation • 3 scenarios of XML shredding

Motivation (1) • General topic: XML-to-SQL query translation • 3 scenarios of XML shredding § schema-oblivious shredding § XML Publishing § schema-based shredding April 5, 2005 Advanced Database Systems, Jochen Stoesser 2

Motivation (2): Schema Graph • Tree schema graph (see example) • directed acyclic (DAG)

Motivation (2): Schema Graph • Tree schema graph (see example) • directed acyclic (DAG) schema graph multiple incoming edges • recursive schema graph Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 3

Motivation (3): Schema-based Shredding • for each non-leaf node create separate relation § Book(id,

Motivation (3): Schema-based Shredding • for each non-leaf node create separate relation § Book(id, title) § Author(id, parentid) § Section(id, parentcode, title) § Para(id, parentid) § Figure(id, parentid, caption, image) • each leaf node is associated with a column name • parentid and parentcode to preserve structure April 5, 2005 Advanced Database Systems, Jochen Stoesser 4

Motivation (4) • Focus of the paper: XML-to-SQL query translation with schema-based shredding, especially

Motivation (4) • Focus of the paper: XML-to-SQL query translation with schema-based shredding, especially in the presence of § recursive XML query, e. g. /book//section/title // /descendant-or-self: : node()/ § • recursive XML schemas (>> example) not solved by § existing schema-based shredding algorithms, § schema-oblivious shredding algorithms, § XML Publishing algorithms April 5, 2005 Advanced Database Systems, Jochen Stoesser 5

Queries: SQL(p) • example: retrieve nodes 9 (titles of subsections, i. e. nodes 7)

Queries: SQL(p) • example: retrieve nodes 9 (titles of subsections, i. e. nodes 7) /book/section/title • path p = <book, section(4), section(7), title> • SQL(p): select from where S 2. title Book B, Section S 1, Section S 2 B. id=S 1. parentid and S 1. parentcode=1 and S 1. id=S 2. parentid and S 2. parentcode=2; >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 6

Queries: Rto. L(l) • root-to-leaf SQL query • possibly multiple root-to-leaf paths p 1,

Queries: Rto. L(l) • root-to-leaf SQL query • possibly multiple root-to-leaf paths p 1, . . . , pm to leaf l • Rto. L(l) : = • retrieves all information that would be retrieved traversing all paths from the root to leaf l • Problem: recursive schema possibly infinite number of root-to-leaf paths Rto. L query: union of infinitely many queries April 5, 2005 Advanced Database Systems, Jochen Stoesser * 7

Query Translation • two stages: 1. Identify the paths in the XML schema graph

Query Translation • two stages: 1. Identify the paths in the XML schema graph that satisfy the query: Path. Id stage 2. Use annotations (schemas) from XML-to-Relational mapping to construct equivalent relational query: SQLGen stage XML schema Path. Id Mapping schema SQLGen SQL query XML query April 5, 2005 Advanced Database Systems, Jochen Stoesser 8

Path. Id Stage • Problems: § recursive schema: number of paths possibly infinite §

Path. Id Stage • Problems: § recursive schema: number of paths possibly infinite § DAG graph: exponential number of paths • General idea: § Represent matching paths as graph instead of enumerating to reflect shared information across multiple paths (will become important for SQLGen stage) § execute query on a schema graph and identify statisfying paths April 5, 2005 Advanced Database Systems, Jochen Stoesser 9

Path. Id Stage • Algorithm outline: § Take automaton AS representing schema graph S

Path. Id Stage • Algorithm outline: § Take automaton AS representing schema graph S § Translate Query into DFA AQ § Create cross-product automaton ASQ from AS and AQ, eliminate all dead states § ASQ contains all matching paths § View ASQ as mapping schema SSQ >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 10

Path. Id Stage: XPath Semantics Query: //section//title Source: Krishnamurthy et al. (2004) April 5,

Path. Id Stage: XPath Semantics Query: //section//title Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 11

SQLGen Stage XML schema Path. Id Mapping schema SQLGen XML query • informally: union

SQLGen Stage XML schema Path. Id Mapping schema SQLGen XML query • informally: union of all Rto. L in mapping schema SSQ corresponds to query result • problem: DAG and recursive graphs & queries § DAG: number of paths can be exponential in the size of the component § recursive graphs & queries: infinite number of matching paths April 5, 2005 Advanced Database Systems, Jochen Stoesser 12

SQLGen Stage • Solution: SQL 99 with-clause • Used to create temporary relations for

SQLGen Stage • Solution: SQL 99 with-clause • Used to create temporary relations for • Nodes in DAG components shared computation, reflects shared information contained in paths through DAG components decrease exponential to polynomial complexity! • Recursive components >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 13

SQLGen: Algorithm (1) • Query: /E 0//E 10 c 1 • Find mapping schema

SQLGen: Algorithm (1) • Query: /E 0//E 10 c 1 • Find mapping schema SSQ (S=SSQ) c 2 • strongly connected, non-recursive components (i = Ei): {0}, {1}, {2}, {3}, {4}, {5}, {6}, {10}, {11} • merge adjacent components ci and cj if ci dominates cj c 3 c 1 = {0, 1, 2, 3, 4, 5, 6}, c 3 = {10, 11} E 11 • recursive component c 2 = {7, 8, 9} Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 14

SQLGen: Algorithm (2), DAG components • further process in top-down topological order • c

SQLGen: Algorithm (2), DAG components • further process in top-down topological order • c 1 = {0, 1, 2, 3, 4, 5, 6} non-recursive DAG component • create temporary relation for each non-root node that is § leaf node § has child or parent in different component § multiple incoming/outgoing edges ( shared computation) • N 1 = {2, 3, 6} >> April 5, 2005 c 1 Advanced Database Systems, Jochen Stoesser 15

SQLGen: Algorithm (3), DAG components • for example for node E 6 N 1:

SQLGen: Algorithm (3), DAG components • for example for node E 6 N 1: shared computation with T 6 as ( select R 6. * from R 4, T 3 where R 4. id=R 6. parentid and T 3. id=R 4. parentid and R 6. parentcode=1 union all select R 6. * from R 5, T 3 where R 5. id=R 6. parentid and T 3. id=R 5. parentid and R 6. parentcode=2 ) April 5, 2005 Advanced Database Systems, Jochen Stoesser 16

SQLGen: Algorithm (4), recursive components • c 2 = {7, 8, 9} recursive component

SQLGen: Algorithm (4), recursive components • c 2 = {7, 8, 9} recursive component • temporary relation TR, schema is union of the schemas of nodes in TR • two parts: 1. Initialization part captures all incoming edges 2. Recursive part captures recursion >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 17

SQLGen: Algorithm (5), initialization • incoming edges from other components: (2, 8) and (3,

SQLGen: Algorithm (5), initialization • incoming edges from other components: (2, 8) and (3, 7) shared computation • Q 1 = select R 8. *, id(8) as schemanode from T 2, R 8 where R 8. parentcode=2 and R 8. parentid=T 2. id • Q 2 = select R 7. *, id(7) as schemanode from T 3, R 7 where R 7. parentcode=3 and R 7. parentid=T 3. id • Qinit = Q 1 Q 2 April 5, 2005 Advanced Database Systems, Jochen Stoesser 18

SQLGen: Algorithm (6), recursion • edges within the recursive component c 2: (7, 9),

SQLGen: Algorithm (6), recursion • edges within the recursive component c 2: (7, 9), (8, 7), (8, 9), (9, 8) • construct recursive query for each of these edges, e. g. Qe 1=(7, 9) = select R 9. *, id(9) as schemanode from TR, R 9 where TR. schemanode=id(7) and R 9. parentid=TR. id and R 9. parentcode=1 • recursive part TR = Qinit QR • for each n c 2: temporary relation T(n) = select * from TR where schemanode=id(n) (>> example) April 5, 2005 Advanced Database Systems, Jochen Stoesser 19

SQLGen: Final Query • result of SQLGen stage: § temporary relations - T 2,

SQLGen: Final Query • result of SQLGen stage: § temporary relations - T 2, T 3, T 6 for c 1 - T 7, T 8, T 9, and TR for c 2 - T 10 and T 11 for c 3 § Final query: § /E 0//E 10 algorithm select elemid from T 11 April 5, 2005 Advanced Database Systems, Jochen Stoesser 20

Empirical Evaluation (1) • XMark Benchmark schema • Translation of query fragments that appear

Empirical Evaluation (1) • XMark Benchmark schema • Translation of query fragments that appear in query suite • XML-to-Relational mapping schema has 101 nodes • Size of cross-product schema in all cases < 100 nodes • Result: Translation processes for each query < 6 ms April 5, 2005 Advanced Database Systems, Jochen Stoesser 21

Empirical Evaluation (2) • test under extreme conditions • all transitions on single label

Empirical Evaluation (2) • test under extreme conditions • all transitions on single label x • query //x//x//x • mapping schema complete graph (clique) of n nodes • runtime of translation algorithm: Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 22

Contributions • Claim: „first to present a generic algorithm that translates path expression queries

Contributions • Claim: „first to present a generic algorithm that translates path expression queries to SQL in the presence of recursion in the schema in the context of schema-based XML storage shredding of XML into relations“ • Algorithm translates a path expression query into a single SQL query, irrespective of the XML schema‘s complexity • SQL query‘s size polynomial in size of the input XML-to. Relational mapping and the XML query April 5, 2005 Advanced Database Systems, Jochen Stoesser 23

Limitations • High complexity of SQL query even for relatively easy XML queries •

Limitations • High complexity of SQL query even for relatively easy XML queries • Although running time may be small, memory requirements may be high due to many temporary relations • Furthermore, although authors indicate solutions for XPath semantics and branching path expression queries (e. g. p 1[p 2 op value]), there is no proposition about increase in complexity regarding runtime, memory requirements etc. April 5, 2005 Advanced Database Systems, Jochen Stoesser 24

Q&A ? April 5, 2005 Advanced Database Systems, Jochen Stoesser 25

Q&A ? April 5, 2005 Advanced Database Systems, Jochen Stoesser 25

References • Krishnamurthy, R. (2004) XML-to-SQL Query Translation. Dissertation at the University of Wisconsin-Madison.

References • Krishnamurthy, R. (2004) XML-to-SQL Query Translation. Dissertation at the University of Wisconsin-Madison. Retrieved at March 18 from http: //www. cs. wisc. edu/~sekar/research/main. pdf • XPath v 1. 0. W 3 C. Retrieved at March 18 from http: //www. w 3. org/TR/1999/REC-xpath-19991116. html • Tian, F. , De. Witt, D. J. , Chen, J. , & Zhang, C. The Design and Performance Evaluation of Alternative XML Storage Strategies. Retrieved at April 4 from http: //www. cs. wisc. edu/~czhang/doc/publications/feng 6 page. pdf April 5, 2005 Advanced Database Systems, Jochen Stoesser 26

Appendix: Recursive XML Schema <? xml version="1. 0" encoding="UTF-8"? > <xs: schema xmlns: xs="http:

Appendix: Recursive XML Schema <? xml version="1. 0" encoding="UTF-8"? > <xs: schema xmlns: xs="http: //www. w 3. org/2001/XMLSchema" version="1. 0"> <xs: element name="element"> <xs: complex. Type> <xs: sequence> <xs: any process. Contents="skip" min. Occurs="0" /> <xs: element ref="element" min. Occurs="0" /> </xs: sequence> * </xs: complex. Type> </xs: element> element </xs: schema> << April 5, 2005 Advanced Database Systems, Jochen Stoesser 27

Appendix: SQL(path p) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 28

Appendix: SQL(path p) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 28

Appendix: Path. Id(Q, S) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 29

Appendix: Path. Id(Q, S) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 29

Appendix: SQLGen(SSQ) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 30

Appendix: SQLGen(SSQ) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 30

Appendix: SQLFrom. DAG(c) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 31

Appendix: SQLFrom. DAG(c) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 31

Appendix: SQLFrom. Recursive(c) (1) April 5, 2005 Advanced Database Systems, Jochen Stoesser 32

Appendix: SQLFrom. Recursive(c) (1) April 5, 2005 Advanced Database Systems, Jochen Stoesser 32

Appendix: SQLFrom. Recursive(c) (2) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 33

Appendix: SQLFrom. Recursive(c) (2) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 33

Appendix: TR example << April 5, 2005 Advanced Database Systems, Jochen Stoesser 34

Appendix: TR example << April 5, 2005 Advanced Database Systems, Jochen Stoesser 34