A Framework for Using Materialized XPath Views in

  • Slides: 34
Download presentation
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei

A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Introduction o XML languages, such as XQuery, XSLT and SQL/XML, employ XPath as the

Introduction o XML languages, such as XQuery, XSLT and SQL/XML, employ XPath as the search and extraction language. XPath expressions often define complicated navigation, resulting in expensive query processing, especially when executed over large collections of documents. As a result, optimization of XPath expressions is vital to efficiently process XML queries. o This paper proposes a framework for exploiting materialized XPath views to process XML queries. It develops an XPath matching algorithm to determine when such views can be used to answer a user query containing XPath expressions.

Introduction o There are two main problems associated with answering XML queries using materialized

Introduction o There are two main problems associated with answering XML queries using materialized XPath views. First, an XPath query containment is required to make sure that a view can be used to answer a query. Second, a compensation expression needs to be constructed, that would compute the query result using the information available from the view.

Introduction o o We address the XPath query containment problem with an XPath matching

Introduction o o We address the XPath query containment problem with an XPath matching algorithm. The containment problem was shown to be NP complete for a restricted subset of XPath. We propose an efficient polynomial-time matching algorithm which is sound and works in most practical cases. The algorithm is based on the observation that a total node mapping from view nodes to query nodes implies containment for conjunctive XPath expressions. We build on the same observation, but extend it to a more functional subset of XPath that includes value predicates, disjunction and the axes allowed in XQuery.

XPath Matching Algorithm o Here we present an algorithm to decide if a given

XPath Matching Algorithm o Here we present an algorithm to decide if a given XPath view can be utilized in a user query. The algorithm finds tree mappings between the view and the query expression trees, and records them in a match structure. If a mapping exists then the view can potentially be used to evaluate the XPath expression in the user query. o In the remainder of this presentation, we first introduce our XPath representation, then describe the basic algorithm, followed by an extension to handle comparison predicates.

XPath Representation o We represent XPath expressions as labeled binary trees, called XPS trees.

XPath Representation o We represent XPath expressions as labeled binary trees, called XPS trees. An XPS node is labeled with its axis and test, where axis is either the special "root", or one of the 6 axes allowed in XQuery: "child", "descendant", "self", "attribute", "descendant-or-self", or "parent". The test is either a name test, a wildcard test, or a kind test. o The first child of an XPS node is called predicate, and it can be a conjunction (and), a disjunction (or), a comparison operator (<, ≤, >, ≥, =, ≠, eq, ne, lt, le, gt, ge), a constant, or an XPath Step (XPS) node. The second child, called next, points to the next step, and is always an XPS node.

Examples of Xpath and XPS Tree

Examples of Xpath and XPS Tree

XPS Tree Construction o o To consider the special need for construction of XPS

XPS Tree Construction o o To consider the special need for construction of XPS tree, we define the structure of XPS node including Axis, Test, and Sequence Number field using Java from scratch without using any auxiliary tool. Meanwhile using this node structure to express the predicate including conjunction (and), a disjunction (or), a comparison operator (<, ≤, >, ≥, =, ≠, eq, ne, lt, le, gt, ge) and a constant. To deal with the complication of the XPath expression, We use recursion method to parse the Xpath expression to build subtrees that can handle the complicate predicate condition. For Example: the predicate of an XPath step may contain a nested XPath expression; multiple conjunction, disjunction or comparison operators appearing in predicate conditions.

Example of XPS Tree Structure o View= //order[lineitem/@price>130 and @count>100 and item. Num=10] root

Example of XPS Tree Structure o View= //order[lineitem/@price>130 and @count>100 and item. Num=10] root 1 descendant order 2 predicate AND 0 predicate > 0 child lineitem 5 attribute price 6 Here to handle multiple conjunction predicate 130 0 and the predicate of an XPath step predicate AND 0 containing a nested XPath expression predicate > 0 attribute count 11 predicate 100 0 predicate = 0 child item. Num 15 predicate 10 0

Example of XPS Tree Structure View = "//order[@price>150 and discount[count>10 and item. Num=100] and

Example of XPS Tree Structure View = "//order[@price>150 and discount[count>10 and item. Num=100] and orde. Num=101]"; root 1 descendant order 2 predicate AND 0 predicate > 0 attribute price 5 predicate 150 0 predicate AND 0 child discount 9 predicate AND 0 predicate > 0 child count 12 predicate 10 0 predicate = 0 child item. Num 16 predicate 100 0 predicate = 0 child orde. Num 21 predicate 101 0 Here we handle nested predicate condition and multiple “And”. o

Basic Matching Algorithm o The algorithm described here traverses both the view and the

Basic Matching Algorithm o The algorithm described here traverses both the view and the query expression trees and computes all possible mappings from XPS nodes of the view to XPS nodes of the query expression, in a single top-down pass of the view tree. o The table below summarizes the basic algorithm in terms of the four functions used. Every function of the table evaluates to Boolean. The algorithm is invoked by the initial call match. Step(v. root, q. root), and there exists a match if this call evaluates to true. The first rule whose condition is satisfied is fired for each function.

Basic Matching Algorithm

Basic Matching Algorithm

Basic Matching Algorithm o Using this algorithm, we can handle the situation where the

Basic Matching Algorithm o Using this algorithm, we can handle the situation where the query expression can be more restrictive than the view definition. o For example, the view V = //* [@*], which contains all XML element nodes which have an attribute, can be used to evaluate Q 1 = //order/lineitem[@price and discount] as shown in Figure. Dotted lines denote the mapping. o Rule 1. 2 says that if one disjunct of pred is mapped by a node v , then v also has to map to some node in the other disjunct of Q. For example, the same V of Figure cannot be used to evaluate the expression Q=//order/lineitem[@price or price], which asks for lineitem nodes, which have either a price attribute or a price element.

Basic Matching Algorithm o When the view node contains a descendant“ axis, we need

Basic Matching Algorithm o When the view node contains a descendant“ axis, we need to keep looking for matches down in the tree, even if the current query expression node matches (rules 1. 3). For example, in Figure 2, we will try to map XPS 2(//*) to XPS 5 (//order), XPS 6(/lineitem), and XPS 9(/discount).

Basic Matching Algorithm

Basic Matching Algorithm

Recording the Match o Why do we need to record matching? n basic matching

Recording the Match o Why do we need to record matching? n basic matching algorithm may generate exponential number of tree mappings. Example: View: //a//a…//a Query: /a/a. . /a Might have distinct tree mappings n Redundant information match. Step() function would be called multiple times with same parameters.

Recording the Match o What to record? n Match matrix structure row: XPS nodes

Recording the Match o What to record? n Match matrix structure row: XPS nodes of query column: XPS nodes of view cell: pair of view and query XPS tree node. possible values: “empty”, “true”, “false” n Directed edges between cells Meaning: Representing the context in mapping Explanation: edge (i, j) (k, l) means match. Step( , ) , ) was called from This is a DAG (Directed Acyclic Graph): matching process is in top-down manner.

Recording the Match o Benefits: n Reduce run-time to polynomial n It is possible

Recording the Match o Benefits: n Reduce run-time to polynomial n It is possible to handle comparison predicates

Example of Match Matrix view=//order/*[@price] query = //order[Line. Item/@price] view tree query tree root

Example of Match Matrix view=//order/*[@price] query = //order[Line. Item/@price] view tree query tree root 1 root 5 //order 2 /* 3 @price 4 //order 6 /Line. Item 7 @price 8

Example of Match Matrix Q V root 1 //order 2 /* 3 @price 4

Example of Match Matrix Q V root 1 //order 2 /* 3 @price 4 root 5 //order 6 //Line. Item 7 @price 8 True False True

Handling Comparison Predicates o Comparison Predicates Format n n n o L op R

Handling Comparison Predicates o Comparison Predicates Format n n n o L op R L and R could be either XPS nodes or a constant. op could be <, <= , >= , = Some logic constrains n n n V =//order/* [@price > 60] Q =//order[lineitem/@price > 30] View can not be used to answer Query.

Handling Comparison Predicates o Example n V =//order/* [@price > 60] view tree root

Handling Comparison Predicates o Example n V =//order/* [@price > 60] view tree root 1 //order 2 /* 3 >4 @price 5 60 6

Handling Comparison Predicates o Two Types of Comparisons n Local predicates n op constant

Handling Comparison Predicates o Two Types of Comparisons n Local predicates n op constant (@price >60) n Intra-document join n op m (@price > @salary) o Normalization n Local predicates Replace comparison operator with sub-tree from n Add comparison into filter list n Intra-document join Replace comparison operator with “and”

Handling Comparison Predicates Examples of local predicate: V =//order/* [@price > 60] view tree

Handling Comparison Predicates Examples of local predicate: V =//order/* [@price > 60] view tree o root 1 //order 2 /* >4 @price 5 3 Filter: “ 5”, ”>”, “ 60” 60 6

Handling Comparison Predicates o Examples of intra-document join: V =//order/*[@price > @salary] view tree

Handling Comparison Predicates o Examples of intra-document join: V =//order/*[@price > @salary] view tree root 1 //order 2 Filter: “ 5”, ”>”, ” 6” /* 3 AND > 44 @price 5 @salary 6

Handling Comparison Predicates o Check restriction for local predicates V: …@price>60… Q: …@price>40… Fail

Handling Comparison Predicates o Check restriction for local predicates V: …@price>60… Q: …@price>40… Fail to pass “restriction check” o Check restriction for intra-document join V: …salary <= bonus[christmas] Q: …salary and bonus[christmas] Fail to pass “restriction check”

Matching Intradocument Joins o Clean up in intra-document join n n Remove all dangling

Matching Intradocument Joins o Clean up in intra-document join n n Remove all dangling edges for which either source or target matrix cell is not set to true. Remove orphan node matches, i. e. , matrix cells with value true that do not have at least one incoming edge, are set to false.

Matching Intradocument Joins o Clean up example: V =//a[@b > @c]; Q =//a[@b >

Matching Intradocument Joins o Clean up example: V =//a[@b > @c]; Q =//a[@b > @c]/a[@b and @c] view tree root 1 query tree root 6 Filter: 4, >, 5 //a 2 //a 7 AND 3 @b 4 @c Filter: 9, >, 10 /a AND 8 5 @b 9 @c 11 AND 12 10 @b 13 @c 14

Matching Intradocument Joins Clean-up example continue: Matching matrix Q root 6 //a 7 @b

Matching Intradocument Joins Clean-up example continue: Matching matrix Q root 6 //a 7 @b 9 @c 10 /a 11 @b 13 @c 14 V root 1 //a 2 @b 3 @c 4 T T T T

Complexity of the Algorithm o Size of the match matrix is O( | V

Complexity of the Algorithm o Size of the match matrix is O( | V | * | Q | ) n o V and Q are the number of XPS nodes in the view and query expressions respectively. Number of edges in DAG is O( |V| * |Q| 2) n Each matrix cell can have at most |Q| incoming edges (by construction an edge (i, j) (l, k) may exist only if vi is the parent of vl). Thus the number of edges in the DAG is O( |V| * |Q| 2)

Complexity of the Algorithm o The cost of constructing the matrix is also polynomial

Complexity of the Algorithm o The cost of constructing the matrix is also polynomial n n The match. Step function has only |V| * | Q | distinct sets of parameters By definition of a match matrix, the same pair of nodes cannot be matched more than once In the worst case (rule 1. 3) a function call may expand into | Q | function calls Thus the algorithm runs in O( |V | * | Q | 2) time.

References o o A Framework for Using Materialized XPath Views in XML Query Processing

References o o A Framework for Using Materialized XPath Views in XML Query Processing Andrey Balmin Fatma Ä Ozcan Kevin S. Beyer Roberta J. Cochrane Hamid Pirahesh IBM Almaden Research Center, San Jose CA S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In Proceedings of ICDE, pages 190 -200, 1995. A. Deutsch and V. Tannen. Containment and integrity constraints for xpath. In Proceedings of KRDB, 2001. J. Goldstein and P. Larson. Optimizing queries using materialized views: A practical, scalable solution. In Proceedings of SIGMOD, Santa Barbara, CA, 2001.

References o o A. Y. Levy, A. O. Mendelzon, Y. Sagiv, and D. Srivastava.

References o o A. Y. Levy, A. O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using views. In Proceedings of PODS, pages 95 -104, 1995. G. Miklau and D. Suciu. Containment and equivalence for an xpath fragment. In Proceedings of PODS, pages 65 -76, 2002.

Questions? & Thank you

Questions? & Thank you