CostBased Query Transformation in Oracle Rafi Ahmed Allison
Cost-Based Query Transformation in Oracle Rafi Ahmed, Allison Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zait, Thierry Cruanes Oracle USA VLDB 2006
Outline q Traditional relational optimizer q Heuristic-based transformation q Cost-based transformation (CBT) q Examples of transformations in Oracle q Cost-based transformation framework q Techniques for efficient CBT q State-space search algorithms q Results VLDB ‘ 06 of performance study 2
Traditional Relational Optimizer q Logical and physical optimization phases q Query transformation based on heuristics or rules q Scope – a single query block q Access path, join orders, and join method VLDB ‘ 06 3
Heuristic-Based Transformation q Early evaluation of restriction and projection q Pruning of redundant operations q Minimizing/merging of QBs • Must not introduce, replicate, or re-position DISTINCT or GROUP-BY operators VLDB ‘ 06 4
Heuristic-Based Unnesting: Q 2 Q 3 Q 2 SELECT d. dept_name, d. budget FROM departments d WHERE EXISTS (SELECT 1 FROM employees e WHERE d. dept_id = e. dept_id and e. salary > 200000; Q 3 SELECT d. dept_name, d. budget FROM departments d, employees e WHERE d. dept_id S= e. dept_id and e. salary > 200000; VLDB ‘ 06 5
Table Elimination: Q 5 Q 6 Q 5 SELECT e. name, e. salary FROM employees e left outer join departments d on e. dept_id = d. dept_id; Q 6 SELECT e. name, e. salary FROM employees e; VLDB ‘ 06 6
Cost-Based Transformation q Certain transformations must be cost -based q Requires a systematic framework q Combines physical and logical components into a global optimizer VLDB ‘ 06 7
A Query with Subqueries: Q 1 SELECT e 1. employee_name, j. job_title FROM employees e 1, job_history j WHERE e 1. emp_id = j. emp_id and j. start_date > '19980101' and e 1. salary > (SELECT AVG (e 2. salary) FROM employees e 2 WHERE e 2. dept_id = e 1. dept_id) and e 1. dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d. loc_id = l. loc_id and l. country_id = 'US'); VLDB ‘ 06 8
Subquery Unnesting: Q 10 [Q 1 Q 10] SELECT e 1. employee_name, j. job_title FROM employees e 1, job_history j, (SELECT AVG(e 2. salary) avg_sal, dept_id FROM employees e 2 GROUP BY dept_id) V WHERE e 1. emp_id = j. emp_id and j. start_date > '19980101' and e 1. dept_id = V. dept_id and e 1. salary > V. avg_sal and e 1. dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d. loc_id = l. loc_id and l. country_id = 'US'); VLDB ‘ 06 9
Group-By View Merging: Q 11 [Q 10 Q 11] SELECT e 1. employee_name, j. job_title FROM employees e 1, job_history j, employees e 2 WHERE e 1. emp_id = j. emp_id and j. start_date > '19980101' and e 2. dept_id = e 1. dept_id and e 1. dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d. loc_id = l. loc_id and l. country_id = 'US') GROUP BY e 2. dept_id, e 1. emp_id, j. rowid, e 1. employee_name, j. job_title, e 1. salary HAVING e 1. salary > AVG (e 2. salary); VLDB ‘ 06 10
A Query with Distinct View: Q 12 SELECT e 1. employee_name, j. job_title e 2. employee_name as mgr_name FROM employees e 1, job_history j, employees e 2, (SELECT DISTINCT dept_id FROM departments d, locations l WHERE d. loc_id = l. loc_id and l. country_id IN (‘UK’, 'US')) VD WHERE e 1. emp_id = j. emp_id and j. start_date > '19980101' and e 1. mgr_id = e 2. emp_id and e 1. dept_id = VD. dept_id; VLDB ‘ 06 11
Join Predicate Pushdown: Q 13 [Q 12 Q 13] SELECT e 1. employee_name, j. job_title e 2. employee_name as mgr_name FROM employees e 1, job_history j, employees e 2, (SELECT dept_id FROM departments d, locations l WHERE d. loc_id = l. loc_id and l. country_id IN (‘UK’, 'US') and e 1. dept_id = d. dept_id) VD WHERE e 1. emp_id = j. emp_id and j. start_date > '19980101' and e 1. mgr_id = e 2. emp_id; VLDB ‘ 06 12
A UNION-ALL Query: Q 14 SELECT e. first_name, e. last_name, job_id, d. department_name, l. city FROM employees e, departments d, locations l WHERE e. dept_id = d. dept_id and d. location_id = l. location_id UNION ALL SELECT e. first_name, e. last_name, j. job_id, d. department_name, l. city FROM employees e, job_history j, departments d, locations l WHERE e. emp_id = j. emp_id and j. dept_id = d. dept_id and d. location_id = l. location_id; VLDB ‘ 06 13
Join Factorization: Q 15 [Q 14 Q 15] SELECT V. first_name, V. last_name, V. job_id, d. department_name, l. city FROM departments d, locations l, (SELECT e. first_name, e. last_name, e. job_id, e. dept_id FROM employees e UNION ALL SELECT e. first_name, e. last_name, j. job_id, j. dept_id FROM employees e, job_history j WHERE e. emp_id = j. emp_id) V WHERE d. dept_id = V. dept_id and d. location_id = l. location_id; VLDB ‘ 06 14
Group-By Placement SELECT T 2. B 2, T 3. C 3, SUM(T 1. A 1) AS SM, COUNT(T 1. D 1) AS CN FROM T 1, T 2, T 3, T 4 WHERE T 1. B 1 = T 2. B 2 and T 2. D 2 = T 3. D 3 and T 3. C 3 = T 4. C 4 GROUP BY T 2. B 2, T 3. C 3; SELECT V 1. B 2, V 1. C 3, SUM(V 1. S 1*V 2. CN 2) AS SM, SUM (V 1. CN 1*V 2. CN 2) AS CN FROM (SELECT SUM (T 1. A 1) AS S 1, COUNT(T 1. D 1) AS CN 1, T 2. B 2 AS B 2, T 3. C 3 AS C 3 FROM T 1, T 2, T 3 WHERE T 1. B 1 = T 2. B 2 and T 2. D 2 = T 3. D 3 GROUP BY T 2. B 2, T 3. C 3) V 1, (SELECT T 4. C 4 AS C 4, COUNT(*) AS CN 2 FROM T 4 GROUP BY T 4. C 4) V 2 WHERE V 1. C 3 = v 2. C 4 GROUP BY V 1. B 2, V 1. C 3; VLDB ‘ 06 15
Transformation Interaction q Sequential order of transformations (S. O. T. ) • The general strategy: Apply a transformation to all QBs of the SQL statement before applying the next transformation q Interleaving – violates S. O. T. q Juxtaposing – violates S. O. T. • Perform GBVM after SU to decide upon SU • C (Q) = 40; C(SU(Q)) = 50; C(GBVM(SU(Q))) = 30 => do SU • Perform GBVM and JPPD separately to decide upon • GBVM C(Q) = 60; C(GBVM(Q)) = 45; C(JPPD(Q)) = 35 implies No GBVM VLDB ‘ 06 16
Oracle Query Processing Architecture Heuristic-based Transformation Front-end Cost-Based Transformation Physical Optimizer Execution VLDB ‘ 06 17
Cost-Based Transformation Framework q Various q State transformation techniques spaces for the transformations q State-space q Capability q Physical search algorithms for deep copying QBs, etc. optimizer: cost model q Transformation annotations VLDB ‘ 06 directives and cost 18
Techniques for Efficient Search q Limiting q Greedy q Cost the number of states approach cut-off q Memory management and caching q Selective q Re-use VLDB ‘ 06 interleaving/juxtaposing of sub-tree cost annotations 19
State Space Search Algorithms q Exhaustive Search (2**N) q Linear Search (N+1) • Enumerates all states in the state space of a QB • Guaranteed to provide the best solution for T • Next state is generated from the best state by applying T on the next element. • Linear search assumes transformations on different • elements are independent of each other. A state is generated by applying T on one element at a time. Accept that state as the best so far, only if it reduces the overall cost of the QB. VLDB ‘ 06 20
State Space Search Algorithms, Cont’d q Iterative Improvement Search (N+1, 2**N) • Start from an initial state and move to the next neighboring • • q state looking for a local minima by always choosing a downward move Repeat the search for another local minima starting with different initial state in the next iteration Stop the search, if there are no more new states to be generated or some terminating condition has been reached. Choose the most optimal of all the local minima as the best state Two-pass Search (2) • Choose the best state from two states, where in one state all elements are transformed and in the other none of the elements are transformed. VLDB ‘ 06 21
Optimization Times for a Query with Four Subqueries Time (Seconds) Num. of States Heuristic 0. 24 s 1 Two Pass 0. 33 s 2 Linear 0. 61 s 5 Exhaustive 0. 97 s 16 VLDB ‘ 06 22
Performance Study q Oracle Application: HR, OE, CRM, Financial, Supply Chain, etc. q Schema: 14, 000 tables q Average of 8 tables per query (1 – 159) q Workload: 241, 000 queries – mostly of SPJ types q 19, 000 – 8% – queries relevant to costbased transformation VLDB ‘ 06 23
Improvement with CBQT VLDB ‘ 06 24
Improvement with Unnesting . VLDB ‘ 06 25
Improvement with JPPD VLDB ‘ 06 26
Summary of Performance Results Total non-SPJ Queries: ~ 19 K Execution Plans Affected CBQT Average Run Degraded Time Queries Improvement Top 5% Improvement 5, 910 20 % 18 % 27 % SQU 12, 279 387 % 15 % 460 % JPPD 1, 797 23 % 11 % 15 % GBP 2, 054 21 % 12 % 34 % VLDB ‘ 06 27
Conclusion q CBQT – a novel contribution q Some new transformations q CBQT essential for complex DSS queries and other applications q Significant VLDB ‘ 06 performance gain 28
- Slides: 28