Query Compilation Parsing Logical Query Plan Source our

  • Slides: 40
Download presentation
Query Compilation Parsing Logical Query Plan Source: our textbook, slides by Hector Garcia-Molina 1

Query Compilation Parsing Logical Query Plan Source: our textbook, slides by Hector Garcia-Molina 1

SQL query parse tree convert answer logical query plan apply laws “improved” l. q.

SQL query parse tree convert answer logical query plan apply laws “improved” l. q. p estimate result sizes l. q. p. +sizes consider physical plans execute statistics Pi pick best {(P 1, C 1), (P 2, C 2). . . } estimate costs {P 1, P 2, …. . } 2

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation names, types u. Convert to a logical query plan (relational algebra expression) w deal with subqueries u. Improve the logical query plan w use algebraic transformations w group together certain operators w evaluate logical plan based on estimated size of relations u. Convert to a physical query plan w search the space of physical plans w choose order of operations w complete the physical query plan 3

Parsing u. Goal is to convert a text string containing a query into a

Parsing u. Goal is to convert a text string containing a query into a parse tree data structure: w leaves form the text string (broken into lexical elements) w internal nodes are syntactic categories u. Uses standard algorithmic techniques from compilers w given a grammar for the language (e. g. , SQL), process the string and build the tree 4

Example: SQL query SELECT title FROM Stars. In WHERE star. Name IN ( SELECT

Example: SQL query SELECT title FROM Stars. In WHERE star. Name IN ( SELECT name FROM Movie. Star WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) Assume we have a simplified grammar for SQL. 5

Example: Parse Tree <Query> <SFW> SELECT <Sel. List> FROM <From. List> <Attribute> <Rel. Name>

Example: Parse Tree <Query> <SFW> SELECT <Sel. List> FROM <From. List> <Attribute> <Rel. Name> title Stars. In SELECT <Sel. List> FROM WHERE <Condition> <Tuple> IN <Query> <Attribute> <From. List> <Attribute> <Rel. Name> name Movie. Star ( <Query> ) star. Name <SFW> WHERE <Condition> <Attribute> LIKE <Pattern> birth. Date ‘%1960’ 6

The Preprocessor ureplaces each reference to a view with a parse (sub)-tree that describes

The Preprocessor ureplaces each reference to a view with a parse (sub)-tree that describes the view (i. e. , a query) udoes semantic checking: w are relations and views mentioned in the schema? w are attributes mentioned in the current scope? w are attribute types correct? 7

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation names, types u. Convert to a logical query plan (relational algebra expression) w deal with subqueries u. Improve the logical query plan w use algebraic transformations w group together certain operators w evaluate logical plan based on estimated size of relations u. Convert to a physical query plan w search the space of physical plans w choose order of operations w complete the physical query plan 8

Convert Parse Tree to Relational Algebra u. Complete algorithm depends on specific grammar, which

Convert Parse Tree to Relational Algebra u. Complete algorithm depends on specific grammar, which determines forms of the parse trees u. Here give a flavor of the approach 9

Conversion u Suppose there are no subqueries. u SELECT att-list FROM rel-list WHERE cond

Conversion u Suppose there are no subqueries. u SELECT att-list FROM rel-list WHERE cond is converted into PROJatt-list(SELECTcond(PRODUCT(rel-list))), or att-list( cond( X (rel-list))) 10

SELECT movie. Title FROM Stars. In, Movie. Star WHERE star. Name = name AND

SELECT movie. Title FROM Stars. In, Movie. Star WHERE star. Name = name AND birthdate LIKE '%1960'; <Query> <SFW> SELECT <Sel. List> <Attribute> movie. Title FROM <From. List> WHERE <Condition> <Rel. Name> , <From. List> Stars. In AND <Condition> <Rel. Name> <Attribute> LIKE <Pattern> Movie. Star birthdate '%1960' <Condition> <Attribute> = <Attribute> star. Name name 11

Equivalent Algebraic Expression Tree movie. Title starname = name AND birthdate LIKE '%1960' X

Equivalent Algebraic Expression Tree movie. Title starname = name AND birthdate LIKE '%1960' X Stars. In Movie. Star 12

Handling Subqueries u. Recall the (equivalent) query: SELECT title FROM Stars. In WHERE star.

Handling Subqueries u. Recall the (equivalent) query: SELECT title FROM Stars. In WHERE star. Name IN ( SELECT name FROM Movie. Star WHERE birthdate LIKE ‘%1960’ ); u. Use an intermediate format called twoargument selection 13

Example: Two-Argument Selection title Stars. In <condition> <tuple> <attribute> star. Name IN name birthdate

Example: Two-Argument Selection title Stars. In <condition> <tuple> <attribute> star. Name IN name birthdate LIKE ‘%1960’ Movie. Star 14

Converting Two-Argument Selection u. To continue the conversion, we need rules for replacing two-argument

Converting Two-Argument Selection u. To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression u. Different rules depending on the nature of the subquery u. Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested) 15

Rules for IN R C <Condition> t IN S X R S C is

Rules for IN R C <Condition> t IN S X R S C is the condition that equates attributes in t with corresponding attributes in S 16

Example: Logical Query Plan title star. Name=name Stars. In name birthdate LIKE ‘%1960’ Movie.

Example: Logical Query Plan title star. Name=name Stars. In name birthdate LIKE ‘%1960’ Movie. Star 17

What if Subquery is Correlated? u. Example is when subquery refers to the current

What if Subquery is Correlated? u. Example is when subquery refers to the current tuple of the outer scope that is being tested u. More complicated to deal with, since subquery cannot be translated in isolation u. Need to incorporate external attributes in the translation u. Some details are in textbook 18

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation names, types u. Convert to a logical query plan (relational algebra expression) w deal with subqueries u. Improve the logical query plan w use algebraic transformations w group together certain operators w evaluate logical plan based on estimated size of relations u. Convert to a physical query plan w search the space of physical plans w choose order of operations w complete the physical query plan 19

Improving the Logical Query Plan u. There are numerous algebraic laws concerning relational algebra

Improving the Logical Query Plan u. There are numerous algebraic laws concerning relational algebra operations u. By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently u. Next we'll survey some of these laws 20

Associative and Commutative Operations uproduct unatural join uset and bag union uset and bag

Associative and Commutative Operations uproduct unatural join uset and bag union uset and bag intersection uassociative: (A op B) op C = A op (B op C) ucommutative: A op B = B op A 21

Laws Involving Selection u. Selections usually reduce the size of the relation u. Usually

Laws Involving Selection u. Selections usually reduce the size of the relation u. Usually good to do selections early, i. e. , "push them down the tree" u. Also can be helpful to break up a complex selection into parts 22

Selection Splitting u C 1 AND C 2 (R) = C 1 ( C

Selection Splitting u C 1 AND C 2 (R) = C 1 ( C 2 (R)) u C 1 OR C 2 (R) = ( C 1 (R)) Uset ( C 2 (R)) if R is a set u C 1 ( C 2 (R)) = C 2 ( C 1 (R)) 23

Selection and Binary Operators u. Must push selection to both arguments: w C (R

Selection and Binary Operators u. Must push selection to both arguments: w C (R U S) = C (R) U C (S) u. Must push to first arg, optional for 2 nd: w C (R - S) = C (R) - S w C (R - S) = C (R) - C (S) u. Push to at least one arg with all attributes mentioned in C: w product, natural join, theta join, intersection w e. g. , C (R X S) = C (R) X S, if R has all the atts in C 24

Pushing Selection Up the Tree u. Suppose we have relations w Stars. In(title, year,

Pushing Selection Up the Tree u. Suppose we have relations w Stars. In(title, year, star. Name) w Movie(title, year, len, in. Color, studio. Name) uand a view w CREATE VIEW Movies. Of 1996 AS SELECT * FROM Movie WHERE year = 1996; uand the query w SELECT star. Name, studio. Name FROM Movies. Of 1996 NATURAL JOIN Stars. In; 25

The Straightforward Tree star. Name, studio. Name year=1996 Movie Stars. In Remember the rule

The Straightforward Tree star. Name, studio. Name year=1996 Movie Stars. In Remember the rule C(R S) = C(R) S ? 26

The Improved Logical Query Plan star. Name, studio. Name year=1996 Stars. In year=1996 Movie

The Improved Logical Query Plan star. Name, studio. Name year=1996 Stars. In year=1996 Movie push selection up tree Stars. In Movie Stars. In push selection down tree 27

Laws Involving Projections u. Consider adding in additional projections u. Adding a projection lower

Laws Involving Projections u. Consider adding in additional projections u. Adding a projection lower in the tree can improve performance, since often tuple size is reduced w Usually not as helpful as pushing selections down u. If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree w Ex: L(R X S) = L( M(R) X N(S)), where M (resp. N) is all attributes of R (resp. S) that are used in L u. Another example: w L(R Ubag S) = L(R) Ubag L(S) But watch out for set union! 28

Projection and Union u Example: Suppose w R(a, b) = {(1, 2)}, w S(a,

Projection and Union u Example: Suppose w R(a, b) = {(1, 2)}, w S(a, b) = {(1, 3)}, and w bag projection u a(R Ubag S) = a({(1, 2), (1, 3)}) = {1, 1} = a({(1, 2)} Ubag a({(1, 3)}) = a(R) Ubag a(S) u a(R Uset S) = a({(1, 2), (1, 3)}) = {1, 1} ≠ {1} = {1} Uset {1} = a({(1, 2)}) Uset a({(1, 3)}) = a(R) Uset a(S) 29

Push Projection Below Selection? u. Rule: L( C(R)) = L( C( M(R))) where M

Push Projection Below Selection? u. Rule: L( C(R)) = L( C( M(R))) where M is all attributes used by L or C u. But is it a good idea? SELECT star. Name FROM Stars. In WHERE movie. Year = 1996; star. Name movie. Year=1996 Stars. In extra work to scan through Stars. In twice movie. Year=1996 star. Name, movie. Year Stars. In 30

Joins and Products u. Recall from the definitions of relational algebra: w R C

Joins and Products u. Recall from the definitions of relational algebra: w R C S = C(R X S) (theta join) w. R S = L( C(R X S)) (natural join) where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates u. To improve a logical query plan, replace a product followed by a selection with a join w Join algorithms are usually faster than doing product followed by selection (on large result 31 of the product)

Duplicate Elimination u. Moving down the tree is potentially beneficial as it can reduce

Duplicate Elimination u. Moving down the tree is potentially beneficial as it can reduce the size of intermediate relations u. Can be eliminated if argument has no duplicates w a relation with a primary key w a relation resulting from a grouping operator u. Legal to push through product, join, selection, and bag intersection w Ex: (R X S) = (R) X (S) u. Cannot push through bag union, bag difference or projection 32

Duplicate Elimination Pitfalls u Example: Suppose w R has two copies of tuple t

Duplicate Elimination Pitfalls u Example: Suppose w R has two copies of tuple t w S has one copy of t w T contains only (1, 2) and (1, 3) u bag union: w (R Ubag S) has one copy of t w (R) Ubag (S) has two copies of t u bag difference: w (R S) has one copy of t w (R) (S) has no copies of t u bag projection: w ( (T)) = {1} w ( (T)) = {1, 1} 33

Grouping and Aggregation u. Since produces no duplicates: w ( L(R)) = L(R) u.

Grouping and Aggregation u. Since produces no duplicates: w ( L(R)) = L(R) u. Get rid of useless attributes: w L(R) = L( M(R)) where M contains all attributes in L u. If L contains only MIN and MAX: w L(R) = L( (R)) 34

Example u. Suppose we have the relations Movie. Star(name, addr, gender, birthdate) Stars. In(title,

Example u. Suppose we have the relations Movie. Star(name, addr, gender, birthdate) Stars. In(title, year, star. Name) uand we want to find the youngest star to appear in a movie for each year: SELECT year, MAX(birthdate) FROM Movie. Star, Stars. In WHERE name = star. Name GROUP BY year; year, MAX(birthdate) name=star. Name X Movie. Stars. In 35

Example cont'd year, MAX(birthdate) name=star. Name year, birthdate X Movie. Stars. In name=star. Name

Example cont'd year, MAX(birthdate) name=star. Name year, birthdate X Movie. Stars. In name=star. Name Movie. Stars. In birthdate, name Movie. Star year, star. Name Stars. In 36

Summary of LQP Improvements u. Selections: w push down tree as far as possible

Summary of LQP Improvements u. Selections: w push down tree as far as possible w if condition is an AND, split and push separately w sometimes need to push up before pushing down u. Projections: w can be pushed down w new ones can be added (but be careful) u. Duplicate elimination: w sometimes can be removed u. Selection/product combinations: w can sometimes be replaced with join 37

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation

Outline u. Convert SQL query to a parse tree w Semantic checking: attributes, relation names, types u. Convert to a logical query plan (relational algebra expression) w deal with subqueries u. Improve the logical query plan w use algebraic transformations w group together certain operators w evaluate logical plan based on estimated size of relations u. Convert to a physical query plan w search the space of physical plans w choose order of operations w complete the physical query plan 38

Grouping Assoc/Comm Operators u Group together adjacent joins, adjacent unions, and adjacent intersections as

Grouping Assoc/Comm Operators u Group together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree u Sets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections) U D A U B E F U D E F A B C C 39

Evaluating Logical Query Plans u. The transformations discussed so far intuitively seem like good

Evaluating Logical Query Plans u. The transformations discussed so far intuitively seem like good ideas u. But how can we evaluate them more scientifically? u. Estimate size of relations, also helpful in evaluating physical query plans u. Coming up next… 40