TUe eindhoven university of technology XAL An XML

  • Slides: 30
Download presentation
TU/e eindhoven university of technology XAL - An XML ALgebra for Query Optimization Flavius

TU/e eindhoven university of technology XAL - An XML ALgebra for Query Optimization Flavius Frasincar Geert-Jan Houben Cristian Pau Databases & Hypermedia Group Division of Computer Science /department of mathematics and computer science January 29, 2002 ADC 2002 1

TU/e eindhoven university of technology Contents 1. 2. 3. 4. 5. 6. 7. 8.

TU/e eindhoven university of technology Contents 1. 2. 3. 4. 5. 6. 7. 8. Motivation XML Query Algebra Goals XML Query Algebras XAL Optimization Laws XAL Heuristic Optimization Algorithm XAL Query Example Conclusion and Future Work /department of mathematics and computer science January 29, 2002 ADC 2002 2

TU/e eindhoven university of technology 1. Motivation • Hera project: automatic hypermedia presentation of

TU/e eindhoven university of technology 1. Motivation • Hera project: automatic hypermedia presentation of data residing in the heterogeneous ‘deep’ web • Use XML technologies for querying, transforming, and integrating large amounts of Web data • Optimization of XML queries is important: need of an XML algebra for query optimization /department of mathematics and computer science January 29, 2002 ADC 2002 3

TU/e eindhoven university of technology 2. XML Query Algebra Goals • Based on W

TU/e eindhoven university of technology 2. XML Query Algebra Goals • Based on W 3 C XML Query Data Model • Genericity – logical operators independent of the underlying storage representation – Optimizability – support query optimizations • Expressivity – express a large class of queries – Composability – operators are closed on the same data type – Flexibility – support various data types /department of mathematics and computer science January 29, 2002 ADC 2002 4

TU/e eindhoven university of technology 3. XML Query Algebras • XOM (Zhang & Dong)

TU/e eindhoven university of technology 3. XML Query Algebras • XOM (Zhang & Dong) • Lore (Stanford) complete and closed, no optimization support specific set of logical operators • Beech et al. (industry) • SAL (Beeri & Tzaban) focus on semistructured data, limited optimization support logical model, no optimization strategies • YATL (INRIA) • XQuery (W 3 C) weak support for optimization (unordered forests) specific data model, focus on data integration … /department of mathematics and computer science January 29, 2002 ADC 2002 5

TU/e eindhoven university of technology 4. XAL • Based on W 3 C XML

TU/e eindhoven university of technology 4. XAL • Based on W 3 C XML Query Data Model • Reduces the impedance mismatch between databases and XML (query languages) by allowing a mix of ordered/unordered operators • Support for optimization (reuse the query optimization heuristics from relational systems) • Fine grained algebra of vertices and edges (Genericity) • Composability, Flexibility, XQuery Compatibility /department of mathematics and computer science January 29, 2002 ADC 2002 6

TU/e eindhoven university of technology 4. 1. XAL Data Model • Rooted connected directed

TU/e eindhoven university of technology 4. 1. XAL Data Model • Rooted connected directed graph with a partial order relation on edges – Acyclic (lexical view) – Cyclic (semantic view) • Formally, /department of mathematics and computer science January 29, 2002 ADC 2002 7

TU/e eindhoven university of technology Properties for Vertex Basic Property Derived Property Result name

TU/e eindhoven university of technology Properties for Vertex Basic Property Derived Property Result name value (e. g. “Dali”) name of the incoming E edge parent type of value (e. g. string) parent vertex (via E edge) parentedge incoming E edge Element Vertex Result Simple Vertex Result value identifier type element childelements outgoing E edges attributes outgoing A edges references outgoing R edges /department of mathematics and computer science January 29, 2002 ADC 2002 8

TU/e eindhoven university of technology Properties for Edge Basic Property Result Derived Property Result

TU/e eindhoven university of technology Properties for Edge Basic Property Result Derived Property Result name element name (E) attribute name (A) ID attribute name (R) “Data” (D) next following sibling edge previous preceding sibling edge type E, A, R, D parent source vertex of the edge child target vertex of the edge Note: Derived Property apply to E, D edges /department of mathematics and computer science January 29, 2002 ADC 2002 9

TU/e eindhoven university of technology 4. 2. XAL Operators • All operators have the

TU/e eindhoven university of technology 4. 2. XAL Operators • All operators have the following form o[f](x 1, x 2, … xn: expression) • Unary operators evaluate the input to a collection of vertices and use the implicit map operation to evaluate the result • Closedness = all operators are closed on collections (support composability) /department of mathematics and computer science January 29, 2002 ADC 2002 10

TU/e eindhoven university of technology Operator Semantics o[f](x: expression) Variable x is bound to

TU/e eindhoven university of technology Operator Semantics o[f](x: expression) Variable x is bound to each vertex in the input collection. For each such binding f(x) is evaluated The semantics of the operator o defines how the partial result (resulting from one variable binding) is computed from f(x) The operator result is built by concatenating all the partial results /department of mathematics and computer science January 29, 2002 ADC 2002 11

TU/e eindhoven university of technology Collection • Generalization of list and set (collections have

TU/e eindhoven university of technology Collection • Generalization of list and set (collections have a boolean order property) • Similar to the mathematician’s monad and functional programmer’s (list) comprehension Monad<M>, where M is a type is a triplet of functions (map<M>, unit<M>, join <M>) XAL has map and join (called union) but no unit operator (the singleton collection is written as the singleton itself) Collections have elements of arbitrary types /department of mathematics and computer science January 29, 2002 ADC 2002 12

TU/e eindhoven university of technology Operators Type • Extraction operators – retrieve the needed

TU/e eindhoven university of technology Operators Type • Extraction operators – retrieve the needed information from XML documents • Meta-operators – control the evaluation of expressions • Construction operators – build new XML documents from the extracted data Note: two vertices are equal if they have the same value /department of mathematics and computer science January 29, 2002 ADC 2002 13

TU/e eindhoven university of technology Extraction Operators • • Projection [type, name](e: expr) Selection

TU/e eindhoven university of technology Extraction Operators • • Projection [type, name](e: expr) Selection [condition](e: expr) Unorder (e: expr) Join (x: expr) ⋈[condition] (y: expr) Cartesian Product (x: expr) (y: expr) Union (x: expr) (y: expr) Note: Flexibility, x and y Difference (x: expr) (y: expr) do not have to be “union compatible” like in Intersection (x: expr) (y: expr) relational algebra /department of mathematics and computer science January 29, 2002 ADC 2002 14

TU/e eindhoven university of technology Projection [type, name](e: expression) type = E, A, R,

TU/e eindhoven university of technology Projection [type, name](e: expression) type = E, A, R, D or disjunctions (|) of these name = regular expression over strings Example. [E, (P|p)ainter[s]#)](e) produces all the target vertices of element containment (E) edges that have names starting with Painter, painter, Painters, or painters, and that originate from the vertices in e /department of mathematics and computer science January 29, 2002 ADC 2002 15

TU/e eindhoven university of technology Meta-operators & Construction Operators • Map map[f](e: expression) •

TU/e eindhoven university of technology Meta-operators & Construction Operators • Map map[f](e: expression) • Kleene Star *[f](e: expression) Note: e is included in the result • Create vertex[type](value) Note: for element vertices the value (identifier) is given by the system • Create edge[type, name, parent](child) /department of mathematics and computer science January 29, 2002 ADC 2002 16

TU/e eindhoven university of technology An Example • Copy a complete graph starting from

TU/e eindhoven university of technology An Example • Copy a complete graph starting from the vertex v map[edge[type(e), name(e), vertex[type(parent(e))](value(parent(e))) ](vertex[type(child(e))](value(child(e)))) ](e) where e = *[parentedge( [E|A|D, #](child(x))) ](x: parentedge( [E|A|D, #](v))) /department of mathematics and computer science January 29, 2002 ADC 2002 17

TU/e eindhoven university of technology 5. XAL Optimization Laws • The main factor in

TU/e eindhoven university of technology 5. XAL Optimization Laws • The main factor in the execution cost of algebra expressions is the iteration (explicit or implicit map operator) over collections • The proposed set of optimization laws aims at reducing iteration size for the data extraction expressions • The laws are inspired by monad laws and relational algebraic optimization rules /department of mathematics and computer science January 29, 2002 ADC 2002 18

TU/e eindhoven university of technology • Law 1 (Left unit) If e 1 is

TU/e eindhoven university of technology • Law 1 (Left unit) If e 1 is of unit type (singleton collection), then e 2(e 1) = e 2 (v : = e 1) • Law 2 (Right unit) If e 2 is the identity function, i. e. e 2 (v) = v, then e 2(e 1) = e 1 • Law 3 (Associativity) (e 1 o e 2) o e 3 = e 1 o ( e 2 o e 3 ) • Law 4 (Empty collection) If e 2 is the empty function, i. e. e 2(v) = (), then e 2(e 1) = () • Law 5 (Decomposition of join) e 1 ⋈[condition] e 2 = [condition](e 1 e 2) /department of mathematics and computer science January 29, 2002 ADC 2002 19

TU/e eindhoven university of technology • Law 6 (Decomposition of projection) If name is

TU/e eindhoven university of technology • Law 6 (Decomposition of projection) If name is a regular expression that can be decomposed in several regular expressions n 1, n 2 , … nn and e is an unordered collection, then [name](e) = [n 1](e) [n 2](e) … [nn](e) • Law 7 (Cascading of selection) [c 1∧c 2∧ … cn](e) = [c 1]( [ c 2]( … ( [ cn ](e)) … )) • Law 8 (Commutativity of selection) [c 1]( [c 2](e)) = [c 2]( [c 1](e)) • Law 9 (Commutativity of selection with projection) If the condition c involves solely vertices that have incoming edges named by the regular expression name, then [name]( [c( [name])](e)) = [c]( [name](e)) • Law 10 (Commutativity of selection with cartesian product) If the condition c involves solely vertices from e 1 , then [c](e 1 e 2) = [c](e 1 ) e 2 /department of mathematics and computer science January 29, 2002 ADC 2002 20

TU/e eindhoven university of technology • Law 11 (Commutativity of selection with binary operators)

TU/e eindhoven university of technology • Law 11 (Commutativity of selection with binary operators) If is one of the set operators: , , or , then [c](e 1 e 2) = [c](e 1) [c](e 2) • Law 12 (Commutativity of binary operators) If is one of the set operators: , , or and e 1 and e 2 are unordered collections, then e 1 e 2 = e 2 e 1 • Law 13 (Commutativity of projection with cartesian product) If name is a regular expression that can decomposed in two regular expressions name 1 and name 2, name 1 involves solely vertices in e 1 and name 2 involves solely vertices in e 2 , then [name](e 1 e 2) = [name 1](e 1) [name 2](e 2) • Law 14 (Commutativity of projection with union) [name](e 1 e 2) = [name](e 1) [name](e 2) /department of mathematics and computer science January 29, 2002 ADC 2002 21

TU/e eindhoven university of technology 6. XAL Heuristic Optimization Algorithm S 1. Eliminate unnecessary

TU/e eindhoven university of technology 6. XAL Heuristic Optimization Algorithm S 1. Eliminate unnecessary iterations (use Laws 1, 2, and 4). After each following step, S 1 is applied again. S 2. Unorder collections (use unorder operator). Collections for which order is not relevant are unordered. S 3. Decompose joins (use Law 5). S 4. Decompose selections (use Law 7). Break down selections into a cascade of selections. It enables moving select operations down in the query tree. S 5. Move selections down as far as possible (use Laws 8, 9, 10, and 11). Based on the commutativity of selection with other operators move selections down in the query tree as far as it is permitted by the selection condition. /department of mathematics and computer science January 29, 2002 ADC 2002 22

TU/e eindhoven university of technology S 6. Apply the most restrictive selections first (use

TU/e eindhoven university of technology S 6. Apply the most restrictive selections first (use Laws 3 and 12). Based on the commutativity and associativity of binary operators rearrange the leaf vertices so that the most restrictive selections apply first. Note: As a selectivity criterion one can use the size of the collection. The most restrictive selections are the selections that produce collections with the fewest elements. S 7. Decompose projections (use Law 6). Break down projections into a union of projections. It enables moving the project operations down in the query tree. S 8. Move projections down as far as possible (use Laws 1, 2, and 4). Based on the commutativity of projection with other operators, move projections down in the query tree as far as possible. S 9. Identify combined operations (use composition laws). Identify subtrees that group operations that can be executed by a single program. /department of mathematics and computer science January 29, 2002 ADC 2002 23

TU/e eindhoven university of technology 7. XAL Query Example • XML repository with three

TU/e eindhoven university of technology 7. XAL Query Example • XML repository with three documents: painters. xml paintings. xml catalogue. xml <painters> <paintings> <items> <painter> <painting> <item> <name>Rembrandt</name> <id>Painting_ID 01</id> <paintingid>Painting_ID 01</paintingid> <description>Dutch painter</description> <name>The Stone Bridge</name> <price>1500000</price> </painter> <author>Rembrandt</author> </item> … </painting> … </painters> … </items> </paintings> /department of mathematics and computer science January 29, 2002 ADC 2002 24

TU/e eindhoven university of technology • Query: Return in alphabetical order the name of

TU/e eindhoven university of technology • Query: Return in alphabetical order the name of the painters that have a painting over $1 000 (the name of the painters will appear in the <result> element as many times as the number of their paintings that fulfill the above condition) • XQuery 1. 0: <result> { FOR $i IN document(“painters. xml”)/painters/painter, $j IN document(“paintings. xml”)/paintings/painting[author = $i/name], $k IN document(“catalogue. xml”)/items/item[paintingid = $j/id] WHERE $k/price/data() > 1000000 RETURN $i/name SORTBY. /data() } </result> /department of mathematics and computer science January 29, 2002 ADC 2002 25

TU/e eindhoven university of technology • Input: – painters. xml: 3 painters (1, 2,

TU/e eindhoven university of technology • Input: – painters. xml: 3 painters (1, 2, 3) – paintings. xml: 100 paintings for painter 1 150 paintings for painter 2 100 paintings for painter 3 – catalogue. xml: Only painter 1 has 20 paintings more expensive than $1 000, all the other paintings are below $1 000 /department of mathematics and computer science January 29, 2002 ADC 2002 26

TU/e eindhoven university of technology • Initial Query Tree XQUERY XAL FOR , ,

TU/e eindhoven university of technology • Initial Query Tree XQUERY XAL FOR , , WHERE SORTBY – Output is alphabetically ordered! § Cartesian Product: 3 x 350 = 367 500 elements /department of mathematics and computer science January 29, 2002 ADC 2002 27

TU/e eindhoven university of technology • I Optimization – Step 2: Unorder collections (commutativity

TU/e eindhoven university of technology • I Optimization – Step 2: Unorder collections (commutativity of XAL binary operators) – Step 4: Decompose selections – Step 5: Move selections down as far as possible § Cartesian Product: 3 x 350 + 350 x 20 = 8 050 elements /department of mathematics and computer science January 29, 2002 ADC 2002 28

TU/e eindhoven university of technology • II Optimization – Step 6: Apply the most

TU/e eindhoven university of technology • II Optimization – Step 6: Apply the most restrictive selections first (switch positions of painter and item) § Cartesian Product: 20 x 350 + 20 x 3 = 7 060 elements /department of mathematics and computer science January 29, 2002 ADC 2002 29

TU/e eindhoven university of technology 8. Conclusion and Future Work • XAL provides an

TU/e eindhoven university of technology 8. Conclusion and Future Work • XAL provides an elegant way (by applying the ‘unorder’ operator) to reuse the heuristic optimization algorithm from relational queries • Investigate new optimization laws that take advantage of the XML specific features (e. g. tree structure, internal references) • Build a translation scheme from XQuery to XAL, exploring the power of expression of XAL /department of mathematics and computer science January 29, 2002 ADC 2002 30