Database System Implementation CSE 507 Query Processing and
Database System Implementation CSE 507 Query Processing and Query Optimization Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition. And Elamsri and Navathe, Fundamentals of Database Systems – 6 th Edition.
Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation
Query Optimization: Introduction q Alternative ways of evaluating a given query q Equivalent q Different expressions algorithms for each operation Which execution plan is most likely to be more efficient? --- Your best intuition
Query Optimization: Introduction An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.
Query Optimization: Introduction q Cost difference between evaluation plans for a query can be enormous q E. g. seconds vs. days in some cases. q Estimation of plan cost typically uses: q Statistical q Statistics q Cost information about relations. estimation for intermediate results formulae for algorithms, computed using statistics, etc….
Query Optimization: Introduction Steps in an ideal cost-based query optimizer: 1. Generate logically equivalent expressions using equivalence rules. 2. Annotate resultant expressions to get alternative query plans. 3. Choose the cheapest plan based on estimated cost.
Generating Equivalent Expressions
Transforming Relational Expressions q Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of tuples on every legal database instance q Note: order of tuples is irrelevant q An equivalence rule says that expressions of two forms are equivalent q Can replace expression of first form by second, or vice versa
Equivalence Rules 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2. Selection operations are commutative. 3. Only the last in a sequence of projection operations is needed, the others can be omitted. 4. Selections can be combined with Cartesian products and theta joins. a. (E 1 X E 2) = E 1 b. 1(E 1 2 E 2) = E 1 1 2 E 2
Equivalence Rules 5. Theta-join operations (and natural joins) are commutative. E 1 E 2 = E 2 E 1 6. (a) Natural join operations are associative: (E 1 E 2) E 3 = E 1 (E 2 E 3) (b) Theta joins are associative in the following manner: (E 1 1 E 2) 2 3 E 3 = E 1 1 3 (E 2 2 E 3) where 2 involves attributes from only E 2 and E 3.
Pictorial Depiction of Equivalence Rules
Equivalence Rules 7. The selection operation distributes over theta join operation under the following two conditions: (a) When all the attributes in 0 involve only the attributes of one of the expressions (E 1) being joined. 0 E 1 E 2) = ( 0(E 1)) E 2 (b) When 1 involves only the attributes of E 1 and 2 involves only the attributes of E 2. 1 E 2) = ( 1(E 1)) ( (E 2))
Equivalence Rules
Examples of Equivalence Rules: Selection Query: Assume the following query: customer_name( branch_city = “New. Delhi”(branch (account Transformation using rule 7 a. customer_name (( branch_city =“New. Delhi”(branch)) (account depositor))) depositor)) § Performing the selection as early as possible reduces the size of the relation to be joined!
Examples of Equivalence Rules: Projection customer_name(( branch_city = “New. Delhi” (branch When we compute ( branch_city = “New. Delhi” (branch account)) account ) we obtain a relation whose schema can have attributes like § branch_name § branch_city § Account Type (only this may be used in next join) § account_number § Balance, etc. depositor)
Examples of Equivalence Rules: Projection Push projections using equivalence rules 8 a and 8 b; Eliminate unneeded attributes from intermediate results to get: customer_name (( account_number (( branch_city = “New. Delhi” (branch account )) depositor ) § Performing the projection as early as possible reduces the size of the relation to be joined.
Examples of Equivalence Rules: Join Ordering § For all relations r 1, r 2, and r 3, (r 1 r 2) r 3 = r 1 (r 2 (Join Associativity) § If r 2 r 3 is quite large and r 1 r 3 ) r 2 is small, we choose (r 1 r 2) r 3 so that we compute and store a smaller temporary relation.
Examples of Equivalence Rules: Join Ordering customer_name(( branch_city = “New Delhi”(branch)) (account depositor) § Could compute the join between account and depositor first, and join result with branch. § Join between account and depositor is likely to be a large relation. § Only a small fraction of the bank’s customers are likely to have accounts in branches located in Delhi. § It is better to compute branch_city = “New. Delhi” (branch) account first.
Annotate resulting expressions to get execution plans: An example
Choosing the Best Execution Plan
Choosing the Best Execution Plan n Must consider the interaction of evaluation techniques l choosing the cheapest algorithm for each operation independently may not yield best overall algorithm. E. g. 4 merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation.
Choosing the Best Execution Plan n Practical query optimizers incorporate elements of the following two broad approaches: 1. Search all the plans and choose the best plan in a cost-based fashion. 2. Uses heuristics to choose a plan.
Cost based Optimization n Practical query optimizers incorporate elements of the following two broad approaches: 1. Search all the plans and choose the best plan in a cost-based fashion (very costly) 2. Uses heuristics to choose a plan.
Heuristic Based Optimization n Cost-based optimization is expensive, even with dynamic programming. n but worthwhile for frequently used queries on large datasets n Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. n Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:
Heuristic Based Optimization (Refer Navathe book) 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. If possible execute smaller joins first. 4. Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation. 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. 6. Identify subtrees that represent groups of operations that can be executed by a single algorithm.
Heuristic Based Optimization Example SELECT FROM WHERE LNAME EMPLOYEE, WORKS_ON, PROJECT PType = ‘Chem’ AND PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘ 1957 -12 -31’; Lname Ptype = “Chem” And Pnumber = Pno And Essn=SSn And Bdate>1957 -12 -31 PROJECT EMPLOYE E WORKS_O N
Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Lname Ptype = “Chem” And Pnumber = Pno And Essn=SSn And Bdate>1957 -12 -31 PROJECT EMPLOYE E WORKS_O N
Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Lname Ptype = “Chem” ( Pnumber = Pno ( Essn=SSn ( Bdate>1957 -12 -31))) PROJECT EMPLOYE E WORKS_O N
Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Lname Ptype = “Chem” ( Pnumber = Pno ( Essn=SSn ( Bdate>1957 -12 -31))) PROJECT EMPLOYE E WORKS_O N
Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Lname Ptype = “Chem” ( Pnumber = Pno ( Essn=SSn ( Bdate>1957 -12 -31))) PROJECT EMPLOYE E WORKS_O N
Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Lname ( Pnumber = Pno ( Essn=SSn )) Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2. Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Lname Pnumber = Pno Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Lname Pnumber = Pno Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Lname Pnumber = Pno Which are the most restrictive selection conditions? Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Lname Pnumber = Pno Which are the most restrictive selection conditions? Essn=SSn Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Lname Pnumber = Pno Which are the most restrictive selection conditions? Essn=SSn 2 nd ? Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Lname 3 rd ? Pnumber = Pno Which are the most restrictive selection conditions? Essn=SSn 2 nd ? Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. 4 th ? Lname 3 rd ? Pnumber = Pno Which are the most restrictive selection conditions? Essn=SSn 2 nd ? Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. 4 th ? Lname 3 rd ? Pnumber = Pno Can we bring 1 st and 2 nd together if they are the most selective conditions? Essn=SSn 2 nd ? Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. 4 th ? Lname 3 rd ? Pnumber = Pno Cannot bring 1 st and 2 nd together as they will create a cross product Essn=SSn 2 nd ? Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Also rearrange the joins to execute small first 4 th 3 rd ? ? Lname Pnumber = Pno Any other shuffling possible? Essn=SSn 2 nd ? Bdate>1957 -12 -31 EMPLOYE E 1 st ? WORKS_O N Ptype = “Chem” PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Also rearrange the joins to execute small first Lname Pnumber = Pno Assume the following: 1. #rec in Employee = 1 Lakh 2. #rec in Works_On = 2 Lakh; assume each emp works on exactly 2 projects. (no sharing) 3. #rec in Project = 2 Lakh Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Also rearrange the joins to execute small first Lname Further assume the following 1. #rec from Employee after selection condition = 50000 2. #rec Project after selection condition = 10 Pnumber = Pno Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Also rearrange the joins to execute small first Lname Pnumber = Pno Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Therefore: 1. #rec after Emp join Works_on = 1 Lakh 2. #rec after Project join Works_on = 10 In other words, result of Project join Works_on would be smaller and therefore should be done first. This way smaller intermediate result table would be written and read back. Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Also rearrange the joins to execute small first 4 th Lname Pnumber = Pno 3 rd Shuffling the joins conditions Essn=SSn Bdate>1957 -12 -31 EMPLOYE E Ptype = “Chem” WORKS_O N PROJECT
Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. Also rearrange the joins to execute small first. Lname Essn=SSn Shuffling the joins conditions Pnumber = Pno Bdate>1957 -12 -31 Ptype = “Chem” PROJECT WORKS_O N EMPLOYE E
Heuristic Based Optimization Example 4. Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation. Lname Essn=SSn Pnumber = Pno Bdate>1957 -12 -31 Ptype = “Chem” PROJECT WORKS_O N EMPLOYE E
Heuristic Based Optimization Example 4. Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation. Lname Essn=SSn Pnumber = Pno Bdate>1957 -12 -31 Ptype = “Chem” PROJECT WORKS_O N EMPLOYE E
Heuristic Based Optimization Example 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. Lname Essn=SSn Any Ideas? Btw Lname is an attribute of the Employee table only. Pnumber = Pno Bdate>1957 -12 -31 Ptype = “Chem” PROJECT WORKS_O N EMPLOYE E
Heuristic Based Optimization Example 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. Lname Essn=SSn Key Tactic: (a) Move Lname towards to Employee. (b) For others keep what is bare min required. Pnumber = Pno Bdate>1957 -12 -31 Ptype = “Chem” PROJECT WORKS_O N EMPLOYE E
Heuristic Based Optimization Example 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. Lname Essn Pnumber Ptype = “Chem” PROJECT Essn=SSn Pnumber = Pno Essn, Pno WORKS_O N Key Tactic: (a) Move Lname towards to Employee. (b) For others keep what is bare min required. Lname, Ssn Bdate>1957 -12 -31 EMPLOYE E
Cost based Optimization: Example on Join ordering n Consider finding the best join-order for r 1 r 2 . . . rn. n There are (2(n – 1))!/(n – 1)! different join orders for above expression. n No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r 1, r 2, . . . rn} is computed only once and stored for future use.
Cost based Optimization: Example on Join ordering n To find best join tree for a set of n relations: l To find best plan for a set S of n relations, consider all possible plans of the form: S 1 (S – S 1) where S 1 is any non-empty subset of S. l Recursively compute costs for joining subsets of S to find the cost of each plan. l Choose l Store the cheapest of the 2 n – 1 alternatives. and reuse the cost of common sub-expressions.
Cost based Optimization: Join ordering algorithm procedure findbestplan(S) if (bestplan[S]. cost ) return bestplan[S] // else bestplan[S] has not been computed earlier, compute it now if (S contains only 1 relation) set bestplan[S]. plan and bestplan[S]. cost based on the best way of accessing S /* Using selections on S and indices on S */ else for each non-empty subset S 1 of S such that S 1 S P 1= findbestplan(S 1) P 2= findbestplan(S - S 1) A = best algorithm for joining results of P 1 and P 2 cost = P 1. cost + P 2. cost + cost of A if cost < bestplan[S]. cost = cost bestplan[S]. plan = “execute P 1. plan; execute P 2. plan; join results of P 1 and P 2 using A” return bestplan[S]
Cost based Optimization: Join ordering algorithm Preferred by most query optimizers
- Slides: 55