SQL Server Query Optimizer Cost Formulas Joe Chang

  • Slides: 65
Download presentation
SQL Server Query Optimizer Cost Formulas Joe Chang jchang 6@yahoo. com www. qdpma. com

SQL Server Query Optimizer Cost Formulas Joe Chang jchang 6@yahoo. com www. qdpma. com , www. solidq. com

Scope - Query Optimizer Parse SQL Execution Plans Cost Model Rows and pages in

Scope - Query Optimizer Parse SQL Execution Plans Cost Model Rows and pages in each operation Data Distribution Statistics Estimate rows and pages Sources – David Dewitt, Conor Cunningham

Query Optimizer References Conor Cunningham Chapter in Inside SQL Server Conor vs. SQL http:

Query Optimizer References Conor Cunningham Chapter in Inside SQL Server Conor vs. SQL http: //blogs. msdn. com/b/conor_cunningham_msft/ David Dewitt PASS 2010 Summit Keynote Search: Microsoft Jim Gray Systems Lab http: //www. slideshare. net/Gray. Systems. Lab/pass-summit-2010 -keynote-david-dewitt http: //www. slideshare. net/Gray. Systems. Lab/pass-summit-2010 -keynote-daviddewitt/download

My material http: //www. qdpma. com/CBO/SQLServer. Co st. Based. Optimizer. html http: //www. qdpma.

My material http: //www. qdpma. com/CBO/SQLServer. Co st. Based. Optimizer. html http: //www. qdpma. com/z. Presentations. ht ml

Paul White – Page Free Space http: //sqlblog. com/blogs/paul_white/default. aspx Inside the Optimiser: Constructing

Paul White – Page Free Space http: //sqlblog. com/blogs/paul_white/default. aspx Inside the Optimiser: Constructing a Plan – Part 4 http: //sqlblog. com/blogs/paul_white/archive/2010/07/31/insid e-the-optimiser-constructing-a-plan-part-4. aspx DBCC RULEON/RULEOFF Inside the Optimizer: Plan Costing http: //sqlblog. com/blogs/paul_white/archive/2010/0 9/01/inside-the-optimizer-plan-costing. aspx DBCC TRACEON (3604); -- Show DBCC output DBCC SETCPUWEIGHT(1 E 0); -- Default CPU weight DBCC SETIOWEIGHT(0. 6 E 0); -- I/O multiplier = 0. 6 DBCC SHOWWEIGHTS; -- Show the settings

Execution Plan Cost Model Index Seek + Key Lookup – Table Scan Joins –

Execution Plan Cost Model Index Seek + Key Lookup – Table Scan Joins – Loop, Hash, Merge Updates (Includes Insert & Delete) Really complicated, not covered here Parallel Execution Plans

Why this is Useful? When does the QO use: an index versus table scan

Why this is Useful? When does the QO use: an index versus table scan Loop Join or Hash/Merge with Scan? Is there a difference between the Cost Model and True Cost Structure? Should I use query hints Parallel Execution Strategy Modern servers – 64+ cores

SQL Server Books Online Query Governor Cost Limit Query cost refers to the estimated

SQL Server Books Online Query Governor Cost Limit Query cost refers to the estimated elapsed time, in seconds, required to complete a query on a specific hardware configuration. Cost Threshold for Parallelism The cost refers to an estimated elapsed time in seconds required to run the serial plan on a specific hardware configuration. … SQL Server creates and runs a parallel plan for a query only when the estimated cost to run a serial plan for the same query is higher than the value set in cost threshold for parallelism.

Adventure Works Example

Adventure Works Example

Estimated Execution Plan

Estimated Execution Plan

Clustered Index Scan

Clustered Index Scan

Index Seek

Index Seek

Index Seek + Key Lookup

Index Seek + Key Lookup

Heap Table

Heap Table

Heap Operations

Heap Operations

The Formula – Seek, Scan (Clustered) Index Scan, Table Scan, Index Seek IO Cost

The Formula – Seek, Scan (Clustered) Index Scan, Table Scan, Index Seek IO Cost 0. 003125 + 0. 00074074 per page CPU Cost 0. 0001581 + 0. 0000011 per row

Key Lookup (& Loop Join) Key/RID Lookup, Nested Loops Join IO Cost 0. 003125

Key Lookup (& Loop Join) Key/RID Lookup, Nested Loops Join IO Cost 0. 003125 x % that require Lookup CPU Cost 0. 0001581 per Lookup 0. 0000011 per additional rows

IO Cost Model Sequential - Random Cost is elapsed time in seconds Random Sequential

IO Cost Model Sequential - Random Cost is elapsed time in seconds Random Sequential 0. 003125 = 1/320 0. 00074074… = 1/1350 Random: 320 IOPS Sequential 1350 pages/sec, or 10. 8 MB/s

Key Lookup – Scan Cross over Key Lookup rows to pages scanned ratio 1

Key Lookup – Scan Cross over Key Lookup rows to pages scanned ratio 1 Key Lookup cost approximately 4 pages in scan operation Non-parallel plan, with other costs Cross-over approx 3. 5 pages per KL row Parallel Plan Closer to 4 pages per Key Lookup row

Loop, Hash and Merge Joins

Loop, Hash and Merge Joins

LHM

LHM

Sort

Sort

Loop Hash and Merge Cost Fixed Loop ~0. 0 + seek cost: 0. 003125

Loop Hash and Merge Cost Fixed Loop ~0. 0 + seek cost: 0. 003125 IO, Incremental 0. 00000418 + Seek 0. 0001581 CPU Hash ~0. 017780 Merge ~0. 005607 Many-to-Many Merge Sort ~0. 011261 0. 00001526* 0. 00000238† 0. 00004738 * Hash incremental cost depends on inner/outer source size † Merge join incremental is per IS & OS row? Merge + Sort fixed cost approx same as Hash fixed cost

Loop, Hash, Merge Cost Fixed Incremental Loop Zero High Hash High Medium Merge Medium

Loop, Hash, Merge Cost Fixed Incremental Loop Zero High Hash High Medium Merge Medium Low Merge Join requires both source rows in index sorted order. Regular Merge only for 1 -1 or 1 -many Many-to-many merge join is more expensive

Plan Cross-over Theory Cost Index Seek + Key Lookup Table Scan Rows

Plan Cross-over Theory Cost Index Seek + Key Lookup Table Scan Rows

Theory & Actual? Cost KL Actual! KL Theory Table Scan KL alternate reality? Rows

Theory & Actual? Cost KL Actual! KL Theory Table Scan KL alternate reality? Rows

Plan and Actual IO Random Sequential Ratio Plan 320 1, 350 (10. 8 M/s)

Plan and Actual IO Random Sequential Ratio Plan 320 1, 350 (10. 8 M/s) Current HD 200* 12, 800 (100 MB/s) 64* SAN 200 1, 280 (10 MB/s) 6. 4 SSD 20, 000 25, 000 *Note: original slide incorrectly listed 640: 1 4. 2187 ~1

Loop, Hash & Merge

Loop, Hash & Merge

Loop Join

Loop Join

Merge Join

Merge Join

Hash Join

Hash Join

Insert, Update & Delete Really complicated See material from Conor For large number of

Insert, Update & Delete Really complicated See material from Conor For large number of rows (25%? ) Consider dropping indexes

Delete Rows Index foreign keys when: Deletes from primary table are frequent

Delete Rows Index foreign keys when: Deletes from primary table are frequent

Parallel Execution Plans Parallel Execution Parallelism Gather, Repartition, Distribute Streams, Partitions

Parallel Execution Plans Parallel Execution Parallelism Gather, Repartition, Distribute Streams, Partitions

Parallel Execution Plan

Parallel Execution Plan

Parallel Operations Distribute Streams Non-parallel source, parallel destination Repartition Streams Parallel source and destination

Parallel Operations Distribute Streams Non-parallel source, parallel destination Repartition Streams Parallel source and destination Gather Streams Destination is non-parallel Bitmap

Scan

Scan

2 X DOP 1 DOP 2 IO Cost same CPU reduce by degree of

2 X DOP 1 DOP 2 IO Cost same CPU reduce by degree of parallelism, except no reduction for DOP 16 8 X 4 X DOP 4 DOP 8 IO contributes most of cost!

DOP 8 DOP 16

DOP 8 DOP 16

IO Cost is the same CPU cost reduced in proportion to degree of parallelism,

IO Cost is the same CPU cost reduced in proportion to degree of parallelism, last 2 X excluded? On a weak storage system, a single thread can saturate the IO channel, Additional threads will not increase IO (reduce IO duration). A very powerful storage system can provide IO proportional to the number of threads. It might be nice if this was optimizer option? The IO component can be a very large portion of the overall plan cost Not reducing IO cost in parallel plan may inhibit generating favorable plan, i. e. , not sufficient to offset the contribution from the Parallelism operations. A parallel execution plan is more likely on larger systems (-P to fake it? )

Partitioned Tables Regular Table Partitioned Tables No Repartition Streams operations!

Partitioned Tables Regular Table Partitioned Tables No Repartition Streams operations!

Parallel Execution: Super Scaling Suppose at DOP 1, a query runs for 100 seconds,

Parallel Execution: Super Scaling Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged CPU time = 100 sec, elapse time = 100 sec What is best case for DOP 2? Assuming nearly zero Repartition Threads cost CPU time = 100 sec, elapsed time = 50? Super Scaling: CPU time decreases going from Non. Parallel to Parallel plan! No, I have not been drinking, today, yet

Super Scaling CPU normalized to DOP 1 CPU-sec goes down from DOP 1 to

Super Scaling CPU normalized to DOP 1 CPU-sec goes down from DOP 1 to 2 and higher (typically 8) Speed up relative to DOP 1 3. 5 X speedup from DOP 1 to 2 (Normalized to DOP 1)

Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for

Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for Microsoft: Can I use Bitmap Filters in OLTP systems with non-parallel plans?

Negative Scaling Query time “Speedup”

Negative Scaling Query time “Speedup”

CPU

CPU

Small Queries – Plan Cost vs. Act Plan Cost Query 3 and 16 have

Small Queries – Plan Cost vs. Act Plan Cost Query 3 and 16 have lower plan cost than Q 17, but not included Q 4, 6, 17 great scaling to DOP 4, then weak Query time Negative scaling also occurs

CPU time Speed up What did I get for all that extra CPU? ,

CPU time Speed up What did I get for all that extra CPU? , Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling Query 2 negative at DOP 2, Q 4 is good, Q 6 get speedup, but at CPU premium, Q 17 and 20 negative after DOP 8

Parallel Exec – Small Queries Why? Almost No value OLTP with 32, 64+ cores

Parallel Exec – Small Queries Why? Almost No value OLTP with 32, 64+ cores Parallelism good if super-scaling Default max degree of parallelism 0 Seriously bad news, especially for small Q Increase cost threshold for parallelism?

Parallel Settings - Strategy Mostly for OLTP Cost Threshold for Parallelism Default: Plan Cost

Parallel Settings - Strategy Mostly for OLTP Cost Threshold for Parallelism Default: Plan Cost > 5: Proposed: 20 -100 In 1997, Pentium Pro 200 MHz, ~5 sec for 50 MB table (index range) scan Today, Xeon 5680, 3. 3 GHz, ~ 30 X faster Parallel plan could run 10 -100 milli-sec

Parallel Settings - Strategy Mostly for OLTP Cost Threshold for Parallelism Default: Plan Cost

Parallel Settings - Strategy Mostly for OLTP Cost Threshold for Parallelism Default: Plan Cost > 5: Proposed: 20 -100 In 1997, Pentium Pro 200 MHz, ~5 sec for 50 MB table (index range) scan Today, Xeon 5680, 3. 3 GHz, ~ 30 X faster Parallel plan could run 10 -100 milli-sec

Parallel Settings - Strategy Cost Threshold for Parallelism Default: Plan Cost > 5: Proposed:

Parallel Settings - Strategy Cost Threshold for Parallelism Default: Plan Cost > 5: Proposed: 20 -100 In 1997, Pentium Pro 200 MHz, ~5 sec for 50 MB table (index range) scan Today, Xeon 5680, 3. 3 GHz, ~ 30 X faster Parallel plan could run 10 -100 milli-sec Max Degree of Parallelism (OLTP) Default: 0, unrestricted, Proposed: 2 -4 Use OPTION (MAXDOP n)

Too Many Indexes Complicates Query Optimization Too many possible execution plan Large Updates –

Too Many Indexes Complicates Query Optimization Too many possible execution plan Large Updates – Maintenance Consider dropping indexes

Parameters and Variables Unknown, remote source Remote Scan: 10, 000 rows Remote Seek xxx

Parameters and Variables Unknown, remote source Remote Scan: 10, 000 rows Remote Seek xxx rows Unknown >, <, BETWEEN > or <: 30% of rows BETWEEN: 1/10 of rows

Temp tables and Table Variables

Temp tables and Table Variables