Dynamic Query Optimization 2 Progressive Query Processing Transparent

  • Slides: 34
Download presentation
Dynamic Query Optimization 2 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD

Dynamic Query Optimization 2 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Problems with static optimization § § § Cost function instability: cardinality error of n-way

Problems with static optimization § § § Cost function instability: cardinality error of n-way join grows exponentially with n Unknown run-time bindings for host variables Changing environment parameters: amount of available space, concurrency rate, etc Static optimization comes in two flavours: 1. Optimize query Q, store the plan, run it whenever Q is posed 2. Every time when Q is posed, optimize it and run it 3 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Early Solutions 1. run several plans simultaneously for a short time, and then select

Early Solutions 1. run several plans simultaneously for a short time, and then select one “best” plan and run it for a long time 2. at every point in a standard query plan where the optimizer cannot accurately estimate the selectivity of an input, a choose-plan operator is inserted Select Unbound predicate Choose-Plan File Scan B-tree-scan Get-Set R 4 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Dynamic Mid-Query Reoptimization Features of the algorithm: § § § 5 Annotated query execution

Dynamic Mid-Query Reoptimization Features of the algorithm: § § § 5 Annotated query execution plan Runtime collection of statistics Dynamic resource reallocation Query plan modification Keeping overhead low Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Motivating Example select from where and and 6 avg(Rel 1. selectattr 1), avg(Rel 1.

Motivating Example select from where and and 6 avg(Rel 1. selectattr 1), avg(Rel 1. selectattr 2), Rel 1. groupattr Rel 1, Rel 2, Rel 3 Rel 1. selectatrr 1 <: value 1 Rel 1. selectatrr 2 <: value 2 Rel 1. jointatrr 2 = Rel 2. jointatrr 2 Rel 1. jointatrr 3 = Rel 3. jointatrr 3 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Collection of Statistics Limitations: § § 7 Can only limit statistics that can be

Collection of Statistics Limitations: § § 7 Can only limit statistics that can be gathered in one pass Not useful for pipelined execution Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Dynamic Resource Reallocation § § 8 Assume 8 MB memory available and 4. 2

Dynamic Resource Reallocation § § 8 Assume 8 MB memory available and 4. 2 MB necessary for each hash-join The optimizer allocates 4. 2 MB for the first hash-join and 250 KB for the second (causing it to execute in two passes) During execution, the statistics collector find out that only 7, 500 tuples produced by the filter The memory manager allocates each of the two hash-joins 2. 05 MB Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Query Plan Modification § Once the statistics are available, modify the plan on the

Query Plan Modification § Once the statistics are available, modify the plan on the fly – Hard to implement! Original plan 9 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Modified plan – optimal solution Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Query Plan Modification: practical solution • Store a partially computed query to disk •

Query Plan Modification: practical solution • Store a partially computed query to disk • Submit a new query using the partial results select avg(Temp 1. selectattr 1), avg(Temp 1. selectattr 2), Temp 1. groupattr from Temp 1, Rel 3 where Temp 1. joinatrr 3=Rel 3. joinattr 3 group by Temp 1. groupattr 10 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Robust Query Processing through Progressive Optimization 11 Progressive Query Processing Transparent Access to Grid|

Robust Query Processing through Progressive Optimization 11 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Motivation § Estimation errors in query optimization – Due to correlations in data SELECT

Motivation § Estimation errors in query optimization – Due to correlations in data SELECT count(*) from cars, accidents, owners WHERE c. id = a. cid and c. id=o. cid and c. make=‘Honda’ and c. model=‘Accord’ – Over-specified queries SELECT * from customers where SSN=blah and name=blah’ – Mis-estimated single-predicate selectivity SELECT count(*) from cars where c. make=? – Out-of-date statistics § Can cause bad plans § Leads to unpredictable performance 12 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Traditional Query Processing SQL Compilation Statistics Optimizer Best Plan Execution 13 Progressive Query Processing

Traditional Query Processing SQL Compilation Statistics Optimizer Best Plan Execution 13 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

LEO: DB 2’s Learning Optimizer Statistics SQL Compilation Optimizer 4. Exploit 3. Feedback Best

LEO: DB 2’s Learning Optimizer Statistics SQL Compilation Optimizer 4. Exploit 3. Feedback Best Plan Adjustments 2. Analyze Plan Execution 1. Monitor 14 Estimated Cardinalities Actual Cardinalities Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Use feedback from cardinality errors to improve future plans Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Progressive Optimization (POP) Statistics SQL Compilation Optimizer 3 4 New Best Plan “MQT”with Actual

Progressive Optimization (POP) Statistics SQL Compilation Optimizer 3 4 New Best Plan “MQT”with Actual Cardinality Best Plan With CHECK 5 2 New knl Plan Execution 6 15 Plan Execution with CHECK Partial Results Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM 1 Confidential Re-optimize If CHECK fails Use feedback from cardinality errors to improve current plan © 2004 IBM Corporation © 2003 IBM Corporation

Outline § § 16 Progressive Optimization – Solution overview – Checkpoint placement – Validity

Outline § § 16 Progressive Optimization – Solution overview – Checkpoint placement – Validity range computation Performance Results Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Progressive Optimization § § Why wait till query is finished to correct problem? –

Progressive Optimization § § Why wait till query is finished to correct problem? – Can detect problem early! – Correct the plan dynamically before we waste any more time! May never execute this exact query again – Parameter markers – Rare correlations – Complex predicates § Long-running query won’t notice re-optimization overhead Result: Plan more robust to optimizer mis-estimates 17 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Solution Overview § § 18 Add CHECKpoints to Query Execution Plans – Check Estimated

Solution Overview § § 18 Add CHECKpoints to Query Execution Plans – Check Estimated cardinalities vs. Actuals at runtime When checking fails: – Treat already computed (intermediate) results as materialized views – Correct the cardinality estimates based on the actual cardinalities – Re-optimize the query, possibly exploiting already performed work § Questions: – Where to add checkpoints? – When is an error big enough to be worth reoptimizing? § Tradeoff between opportunity (# reoptimization points) and risk (performance regression) Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

CHECK Placement (1) § Three constraints § Must not have performed side-effects – Given

CHECK Placement (1) § Three constraints § Must not have performed side-effects – Given out results to application – Performed updates § Want to reuse as much as possible § Don’t reoptimize if the plan is almost finished 19 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

CHECK Placement (2) § Lazy CHECK: NLJN – Just above a dam: TEMP, SORT,

CHECK Placement (2) § Lazy CHECK: NLJN – Just above a dam: TEMP, SORT, HSJN inner – Very low risk of regression Lazy Check – Provides safeguard for hash-join, merge-join, etc. § Lazy Checking with Eager Materialization DAM – Pro-actively add dams to enable checkpointing – E. g. outer of nested-loops join Eager Check § Eager Checking – It may be too late to wait until the dam is complete – Check cardinalities before tuples are inserted into the dam § Can extrapolate to estimate final cardinality 20 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

CHECK Operator Execution § IF actual cardinality not in [low, high]): – Save as

CHECK Operator Execution § IF actual cardinality not in [low, high]): – Save as a “view match structure” whose § Definition (“matching”) was pre-computed at compile time § Cardinality is actual cardinality – Terminate execution & return special error code – Re-invoke query compiler § ELSE continue execution § How to set the [low, high] range? 21 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Outline § § 22 Progressive Query Processing – Solution overview – Checkpoint placement –

Outline § § 22 Progressive Query Processing – Solution overview – Checkpoint placement – Validity range computation Performance Results Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Validity Range Determination (1) § § § 23 At a given operator, what input

Validity Range Determination (1) § § § 23 At a given operator, what input cardinality change will cause a plan change? i. e. when is this plan valid In general, equivalent to parametric optimization – Super-exponential explosion of alternative plans to consider – Finds optimal plan for each value range, for each subset of predicates, So we focus on changes in a single operator – Local decision – E. g. NLJN HSJN – Not join order changes – Advantage: Can be tracked during original optimization – Disadvantage: Pessimistic model, since it misses reoptimization opportunities Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Validity Range Determination (2) L 2 L 1 P 2 P 1 P inner

Validity Range Determination (2) L 2 L 1 P 2 P 1 P inner outer P Q Q § Suppose P 1 and P 2 considered during optimizer pruning – cost(P 1, est_cardouter) < cost(P 2, est_cardouter) – Estimate upper and lower bounds on cardouter s. t. P 2 dominates P 1 – Use bounds to update (narrow) the validity range of outer (likewise for inner) ü Applies to arbitrary operators ü Can be applied all the way up the plan tree 24 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Example of a Cost Analysis § Lineitem × Orders query – Vary selectivity of

Example of a Cost Analysis § Lineitem × Orders query – Vary selectivity of o_orderdate < ‘date’ predicate § N 1, M 1, H 1: Orders as outer – N 1, M 1: SORT on outer – N 1: ISCAN on inner § N 2, M 2, H 2: Lineitem as outer § Optimal Plan: N 1 H 2 M 1 25 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM H 2 M 1 N 1 Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Upper Bounds from pruning M 1 with N 1 § Upper bounds vary §

Upper Bounds from pruning M 1 with N 1 § Upper bounds vary § Misses pruning with H 2 because outer/inner reversed § Still upper bounds set conservatively; no false reoptimization 26 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Lower Bounds from pruning N 1 with M 1 H 2 N 1 27

Lower Bounds from pruning N 1 with M 1 H 2 N 1 27 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Outline § § 28 Progressive Query Processing – Solution overview – Checkpoint placement –

Outline § § 28 Progressive Query Processing – Solution overview – Checkpoint placement – Validity range computation Performance Results – Parameter markers (TPCH query) – Correlations (customer workload for a motor vehicles department) – Re-optimization Opportunities with POP Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Robustness for Parameter Marker in TPC-H Query 10 4 -way Join: goes thru 5

Robustness for Parameter Marker in TPC-H Query 10 4 -way Join: goes thru 5 different optimal plans 29 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Response Time of DMV with and without POP Box: 25 th to 75 th

Response Time of DMV with and without POP Box: 25 th to 75 th percentile of queries 30 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Speed-Up (+) vs. Regression (-) of DMV with POP 31 Progressive Query Processing Transparent

Speed-Up (+) vs. Regression (-) of DMV with POP 31 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Scatter Plot of Response Times for DMV 32 Progressive Query Processing Transparent Access to

Scatter Plot of Response Times for DMV 32 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Reoptimization Opportunities with POP 33 Progressive Query Processing Transparent Access to Grid| ACM Data.

Reoptimization Opportunities with POP 33 Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Related Work § § § 34 Choose-Plans: Graefe/Cole, Redbrick, … Parametric Query Optimization Least-expected

Related Work § § § 34 Choose-Plans: Graefe/Cole, Redbrick, … Parametric Query Optimization Least-expected cost optimization Kabra/De. Witt Mid-query re-optimization, Query Scrambling Runtime Adaptation – Adaptive Operators: § DB 2/z. OS, DEC RDB, …: adaptive selection of access methods § Ingres: adaptive nested loop join § XJoin, Tukwila: adaptive hash join § Pang/Carey/Livny, Zhang/Larson: dynamic memory adjustment § … – Convergent query processing – Eddies: adaptation of join orders – Ste. Ms: adaptation of join algorithms, spanning trees, … Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation

Conclusions 35 § § POP makes plans for complex queries more robust to optimizer

Conclusions 35 § § POP makes plans for complex queries more robust to optimizer misestimates Significant performance improvement on real workloads Overhead of re-optimization is very low, scales with DB size Validity ranges tell us how risky a plan is – Can be used for many applications to act upon cardinality sensitivity § Future Work: – CHECK estimates other than cardinality § # concurrent applications § Memory available in buffer pool, sort heap § Actual run time, actual # I/Os – Avoid re-optimization too late in plan of if cost of optimization too high – Re-optimization in shared-nothing query plans – Extend validity ranges to more general plan robustness measures Progressive Query Processing Transparent Access to Grid| ACM Data. SIGMOD Objects 2004 | IBM Confidential © 2004 IBM Corporation © 2003 IBM Corporation