Youre Smarter than a Database Overcoming the optimizers

You’re Smarter than a Database Overcoming the optimizer’s bad cardinality estimates

About me n n Bobby Durrett US Foodservice bobbyandmarielle@yahoo. com Scripts in http: //www. geocities. com/bobbyandma rielle/sqltuning. zip

What you know

What the database knows

Before SQL n Example - mainframe Datacom/DB COBOL n n List index names Write loops read a from one index i 1 where one. c=10 while more table one rows exist get next row read b from two index i 2 where two. a = one. a while more table two rows exist get next row print one. a, two. b end while

SQL n Tell what you want, not how to get it select one. a, two. b from one, two where one. c=10 and one. a=two. a;

Pre-SQL versus SQL n n Pre-SQL code very efficient – runs in megabytes – VSE mainframe COBOL Labor intensive SQL can be inefficient – runs in gigabytes (if you are lucky!) Much more productive – do in minutes what took hours before – create tables

What the database doesn’t know n n n Optimizer has a limited set of statistics that describe the data It can miscalculate the number of rows a query will return, its cardinality A cardinality error can lead optimizer to choose a slow way to run the SQL

Example plan/Cardinality ------------------------| Id | Operation | Name | Rows | Cost ------------------------| 0 | SELECT STATEMENT | | 10 | 3 |* 1 | TABLE ACCESS FULL| TEST 1 | 10 | 3 ------------------------n n n Plan = how Oracle will run your query Rows = how many rows optimizer thinks that step will return Cost = estimate of time query will take, a function of the number of rows

How to fix cardinality problems n n Find out if it really is a cardinality issue Determine the reason it occurred n n n Single column Multiple columns Choose a strategy n n n Give the optimizer more information Override optimizer decision Change the application

Four examples n n Four examples of how the optimizer calculates cardinality Full scripts and their outputs on portal, pieces on slides – edited for simplicity

Step 1: Find out if it really is a cardinality issue n Example 1 n Data select a, count(*) from test 1 group by a; A COUNT(*) -----1 10 n Query select * from test 1 where a=1;

Step 1: Find out if it really is a cardinality issue n Get estimated cardinality from plan ---------------------| Id | Operation | Name | Rows | ---------------------| 0 | SELECT STATEMENT | | 10 | |* 1 | TABLE ACCESS FULL| TEST 1 | 10 | ---------------------n Do query for actual number of rows select count(*) from test 1 where a=1;

Step 1: Find out if it really is a cardinality issue n Plan is a tree – find cardinality and select count(*) on part of query represented by that part of plan. join table

Step 2: Understand the reason for the wrong cardinality n Unequal distribution of data: n Within a single column n Last name “Smith” or “Jones” Among multiple columns – n n Address Zipcode and State

Step 2: Understand the reason for the wrong cardinality n Example 2 - Unequal distribution of values in a single column n n 1, 000 rows with value 1 1 row with value 2 select a, count(*) from TEST 2 group by a; A COUNT(*) -----1 1000000 2 1

Step 2: Understand the reason for the wrong cardinality n SQL statement – returns one row select * from TEST 2 where a=2;

Step 2: Understand the reason for the wrong cardinality n Plan with wrong number of rows = 500, 000 n Full scan instead of range scan – 100 times slower ----------------------| Operation | Name | Rows | ----------------------| SELECT STATEMENT | | 500 K| | INDEX FAST FULL SCAN| TEST 2 INDEX | 500 K| -----------------------

Step 2: Understand the reason for the wrong cardinality n Column statistics – two distinct values LOW HIGH NUM_DISTINCT ----------1 2 2 n Table statistic – total # of rows – 1, 000, 001 NUM_ROWS -----1000001

Step 2: Understand the reason for the wrong cardinality n n n Rows in plan = (rows in table)/ (distinct values of column) 500000=1000001/2 Optimizer knew that there were only two values – assumed they had equal number of rows

Step 2: Understand the reason for the wrong cardinality n Example 3 - Combinations of column values not equally distributed n n 1, 000 rows with values 1, 1 1, 000 rows with values 2, 2 1 row with value 1, 2 ~ Equal numbers of 1’s and 2’s in each column A B COUNT(*) ----------1 1 1000000 1 2 2 1000000

Step 2: Understand the reason for the wrong cardinality n SQL statement – retrieves one row select sum(a+b) from TEST 3 where a=1 and b=2;

Step 2: Understand the reason for the wrong cardinality n n Plan with wrong number of rows = 500, 000 Inefficient full scan -----------------------| Operation | Name | Rows | -----------------------| SELECT STATEMENT | | 1 | | SORT AGGREGATE | | 1 | | INDEX FAST FULL SCAN| TEST 3 INDEX | 500 K| -----------------------

Step 2: Understand the reason for the wrong cardinality n Column statistics C LOW HIGH NUM_DISTINCT - ----------A 1 2 2 B 1 2 2 n Table statistic – total # of rows – 2, 000, 001 NUM_ROWS -----2000001

Step 2: Understand the reason for the wrong cardinality n n n Rows in plan = (rows in table)/ (distinct values A * distinct values B) 500000=2000001/(2 * 2) Optimizer assumes all four combinations (1, 1), (1, 2), (2, 1), (2, 2) equally likely

Step 2: Understand the reason for the wrong cardinality n How to tell which assumption is in play? n Select count(*) each column select a, count(*) from TEST 3 group by a; select b, count(*) from TEST 3 group by b; n count(*) each column combination select a, b, count(*) from TEST 3 group by a, b;

Step 3: Choose the best strategy for fixing the cardinality problem n Giving the optimizer more information n Overriding optimizer decisions n n Histograms SQL Profiles Hints Changing the application Try to use optimizer as much as possible to minimize development work

Step 3: Choose the best strategy for fixing the cardinality problem n Giving the optimizer more information – using histograms n n n Works for unequal distribution within a single column A histogram records the distribution of values within a column in up to 254 “buckets” Works best on columns with fewer than 255 distinct values

Step 3: Choose the best strategy for fixing the cardinality problem n Run gather_table_stats command to get histograms on the column – 254 is max number of buckets method_opt=>'FOR ALL COLUMNS SIZE 254'

Step 3: Choose the best strategy for fixing the cardinality problem n Plan for Example 2 with correct number of rows with histogram Uses range scan --------------------| Operation | Name | Rows | --------------------| SELECT STATEMENT | | 1 | | INDEX RANGE SCAN| TEST 2 INDEX | 1 | --------------------n

Step 3: Choose the best strategy for fixing the cardinality problem n Column statistics – two buckets LOW HIGH NUM_DISTINCT NUM_BUCKETS ---------- -----1 2 2 2 n Table statistic – unchanged NUM_ROWS -----1000001

Step 3: Choose the best strategy for fixing the cardinality problem n Time without histograms (1 second): Elapsed: 00: 01. 00 n Time with histograms(1/100 th second): Elapsed: 00: 00. 01

Step 3: Choose the best strategy for fixing the cardinality problem n Giving the optimizer more information – using SQL Profiles n n Works for unequal distribution among multiple columns Includes information about the relationship between columns in the SQL – correlated columns or predicates

Step 3: Choose the best strategy for fixing the cardinality problem n SQL Tuning Advisor gathers statistics on the columns . . . DBMS_SQLTUNE. CREATE_TUNING_TASK(. . . DBMS_SQLTUNE. EXECUTE_TUNING_TASK(. . . n Accept the SQL Profile it creates to use the new statistics . . . DBMS_SQLTUNE. ACCEPT_SQL_PROFILE (. . .

Step 3: Choose the best strategy for fixing the cardinality problem n Example 3 plan with correct number of rows = 1 using SQL profile -------------------------| Operation | Name | Rows | Bytes | -------------------------| SELECT STATEMENT | | 1 | 6 | | SORT AGGREGATE | | 1 | 6 | | INDEX RANGE SCAN| TEST 3 INDEX | 1 | 6 | -------------------------|

Step 3: Choose the best strategy for fixing the cardinality problem n Time without a profile (1 second): Elapsed: 00: 01. 09 n Time with a profile(1/100 th second): Elapsed: 00: 00. 01

Step 3: Choose the best strategy for fixing the cardinality problem n Overriding optimizer decisions – using hints n n n Example 4 has unequal distribution of column values across two tables – histograms and SQL Profiles don’t work Hint forces index range scan Small amount of additional code – not like Cobol on mainframe

Step 3: Choose the best strategy for fixing the cardinality problem n Example 4 - SMALL table n n MANY relates to 1 – there are many rows with value 1 FEW relates to 2 – there are few with value 2 insert into SMALL values ('MANY', 1); insert into SMALL values ('FEW', 2);

Step 3: Choose the best strategy for fixing the cardinality problem n Example 4 - LARGE table: 1, 000 rows with value 1 n 1 row with value 2 NUM COUNT(*) -----1 1000000 2 1 n

Step 3: Choose the best strategy for fixing the cardinality problem n SQL statement – returns one row select B. NUM from SMALL A, LARGE B where A. NUM=B. NUM and A. NAME='FEW';

Step 3: Choose the best strategy for fixing the cardinality problem n Plan with wrong number of rows = 125, 000 -----------------------| Operation | Name | Rows | -----------------------| SELECT STATEMENT | | 125 K| | HASH JOIN | | 125 K| | TABLE ACCESS FULL | SMALL | 1 | | INDEX FAST FULL SCAN| LARGEINDEX | 1000 K| -----------------------

Step 3: Choose the best strategy for fixing the cardinality problem n Column statistics – two buckets on all columns – using histograms LOW HIGH NUM_DISTINCT NUM_BUCKETS ---------- -----1 2 2 2 LOW HIGH NUM_DISTINCT NUM_BUCKETS ------ -----FEW MANY 2 2

Step 3: Choose the best strategy for fixing the cardinality problem n Table statistics – SMALL has 2 rows, LARGE 1000001 NUM_ROWS -----2 NUM_ROWS -----1000001

Step 3: Choose the best strategy for fixing the cardinality problem n n 125000=1000001/8 Optimizer appears to assume all eight combinations of the three columns’ values are equally likely Can’t verify formula – references don’t include formula with histograms Even worse without histograms – cardinality is 500000

Step 3: Choose the best strategy for fixing the cardinality problem n No SQL profile from SQL Tuning Advisor: There are no recommendations to improve the statement. n Neither histograms nor SQL profiles help example 4

Step 3: Choose the best strategy for fixing the cardinality problem n Statement with hints: n n Use index Don’t do full scan select /*+ INDEX(B LARGEINDEX) NO_INDEX_FFS(B LARGEINDEX) B. NUM from SMALL A, LARGE B where a. NUM=B. NUM and A. NAME='FEW'; */

Step 3: Choose the best strategy for fixing the cardinality problem n Time without a hint (1 second): Elapsed: 00: 01. 03 n Time with a hint (1/100 th second): Elapsed: 00: 00. 01

Step 3: Choose the best strategy for fixing the cardinality problem n Changing the application n n Change your tables so that the optimizer gets your SQL’s cardinality right Requires more work designing tables, but keeps productivity benefits of SQL

Step 3: Choose the best strategy for fixing the cardinality problem n Example 4 – moved NAME column to LARGE table and split table in two n n n One million (‘MANY’, 1) rows in LARGEA One (‘FEW’, 2) row in LARGEB Query: select NUM from (select * from largea union select * from largeb) where NAME='FEW';

Step 3: Choose the best strategy for fixing the cardinality problem n Plan is just as efficient as with hint: n n Number of rows = 2 (reality is 1) Range Scan -------------------------------| Id | Operation | Name | Rows | -------------------------------| 0 | SELECT STATEMENT | | 2 | | 1 | VIEW | | 2 | SORT UNIQUE | | 2 | | 3 | UNION-ALL | | 4 | TABLE ACCESS BY INDEX ROWID| LARGEA | 1 | |* 5 | INDEX RANGE SCAN | LARGEAINDEX | 1 | | 6 | TABLE ACCESS BY INDEX ROWID| LARGEB | 1 | |* 7 | INDEX RANGE SCAN | LARGEBINDEX | 1 | -------------------------------

Step 3: Choose the best strategy for fixing the cardinality problem n Time without table change (1 second): Elapsed: 00: 01. 03 n Time with table change (1/100 th second): Elapsed: 00: 00. 01

Conclusion n SQL improves productivity, optimizer has limits n n Identify cases where cardinality is wrong Understand why the database got it wrong n n n One column Multiple columns Choose best strategy to fix n n n Give optimizer more info Override optimizer’s choices Redesign tables

References n n Cost Based Optimizer Fundamentals, Jonathan Lewis Metalink Note: 212809. 1, Limitations of the Oracle Cost Based Optimizer Metalink Note: 68992. 1, Predicate Selectivity Histograms – Myths and Facts, Wolfgang Breitling Select Journal, Volume 13, Number 3