CS 4432 Database Systems II Query Optimizer CostBased
CS 4432: Database Systems II Query Optimizer – Cost-Based Optimization cs 4432 1
SQL query parse tree convert answer logical query plan apply laws “improved” l. q. p estimate result sizes l. q. p. +sizes consider physical plans execute statistics Pi pick best {(P 1, C 1), (P 2, C 2). . . } estimate costs {P 1, P 2, …. . } CS 4432 2
A Query (Evaluation) Plan v. An extended relational algebra tree v. Annotations at each node indicate: v access methods to use for each table. v implementation methods used for each relational operator. sname bid=100 (On-the-fly) sname rating > 5 bid=100 rating > 5 (On-the-fly) (Nested Loop Join) sid=sid Reserves Sailors
How to cost a physical plan? • We need estimated size of intermediate results (Chapter 16. 4) • Cost of each operator implementation/algorithm (Chapter 15) • Buffer available for the query (Key Cost Factor) cs 4432 4
Result of Cost-based Optimization • Good physical plan – Consider different join orderings – Consider different access methods for accessing the relations cs 4432 5
How to generate ‘good’ Physical Plan? Many alternate search algorithms are possible: 1. Exhaustive listing of all possible plans 2. Dynamic programming 3. Branch and bound 4. Greedy bottom-up plan construction NOTE: often only left-deep trees are being considered to keep search space small. cs 4432 6
Why left-deep trees? • Fundamental decision in System R (IBM): – Only left-deep join trees are considered. – Left-deep trees generate all fully pipelined plans. • Intermediate results not written to temporary files. • Not all left-deep trees are fully pipelined (e. g. , SM join). D D C A B C D A B C A B
Enumeration of Left-Deep Trees • Left-deep trees differ in : – the order of relations, – the access method for each relation, and – the join method for each join. • Number of left deep plans still exponential – n relations implies n! left-deep tree orderings
Enumeration of Left-Deep Trees Enumerated using N passes (if N relations joined): – – – Pass 1: Find best 1 -relation plan for each relation. Pass 2: Find best way to join result of each 1 -relation plan (as outer) to another relation. (All 2 -relation plans. ) Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to the N’th relation. (All N-relation plans. ) For each subset of relations, retain: – – Cheapest plan overall, plus Cheapest plan for each interesting order of the tuples. Pass 1 A Pass 2 Pass 3 B C D
Enumeration Example • Read Chapter 16. 6 • If too many relations (Chapter 16. 6. 6): – Dynamic Programming expensive if too many relations (say more than 6 relations). – Use greedy (faster algorithm, but may yield plans not as good as Dynamic Programming) cs 4432 10
Operator Types • Stateful versus stateless operators – Select is stateless – Join is stateful • Blocking versus non-blocking operators – Select is non-blocking – Agg functions are blocking • Pipelined versus non-pipelined operators – Select is pipelinable – What about Join ? (see next slide) cs 4432 11
Is Join Pipelinable ? • Join P: depends on implementation strategy chosen for an operator – Iteration-join : pipelinable – Merge-sort join : blocking – Index join : pipelinable – Hash join : blocking cs 4432 12
Costing of a complete plan • We will go over an example query plan • Important: first we classify operators as pipelined or not-pipelined • If pipelined, then for stateless operators the IO cost is zero (for example, for Select or Project) cs 4432 13
Costing of a Complete Query Plan • What about a Select? How is it implemented? • If in middle of plan, pipeline it (one tuple at a time iteration) • If at leaf of plan, identify any potential index to use index-lookup to implement the Select • If index available, cost of select operator strategy is equal to cost of an index lookup cs 4432 14
Costing of a complete plan • Main idea: – Determine # of distinct values – V(R, a) – Determine physical impl. strategies per operator – Then, compute IO costs for each operator – Then, sum up all costs. – Done. cs 4432 15
- Slides: 15