Modern Performance SQL Server Joe Chang www qdpma

About Joe • • • SQL Server consultant since 1999 Query Optimizer execution plan

Overview • General SQL Server Performance • Why performance is still important today? –

Not in this session • List of rules to be followed blindly • without

Common Themes? • execution plan – Very large (multiple order of magnitude) error in

CPU & Memory 2001 versus 2014 QPI C 4 C 3 C 2 LLC

CPU & Memory 2001 versus 2012 QPI PCI-E QPI MI PCI-E LLC MI C

Intel E 5 & E 7 v 2 (Ivy-Bridge) E 3 v 3 GFX

Microprocessor Pipeline 3 GHz 0. 33 ns clock 1 st BP IF ID RAT

CPU Access Times Logical 0 Logical 1 Core – 3. 33 GHz 1 CPU

Latency Orders of Magnitude Core – 3. 33 GHz 1 CPU cycle = 0.

Westmere-EX 8 -Socket System QPI C 4 C 3 C 2 LLC QPI C

Storage 2001 versus 2012/13 QPI MCH HDD 10 Gb. E RAID SSD SSD HDD

SAN Node 1 Node 2 768 GB Node 1 1024 GB Switch 10 K

Performance Past, Present, Future • When will servers be so powerful that … –

SQL Performance SQL Tables natural keys Indexes Query Optimizer DOP Memory Parallel plans API

Factors to Consider SQL Tables Indexes Statistics Query Optimizer Storage Engine Hardware Compile Parameters

Special Topics • Data type mismatch • Multiple Optional Search Arguments (SARG) – Function

1 a. Data type mismatch DECLARE @name nvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER

1 b. Type Mismatch – Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE

SELECT TOP + Row Estimate Error SELECT TOP 1000 [Document]. [Artifact. ID] FROM [Document]

2. Multiple Optional SARG DECLARE @Orderkey int, @Partkey int = 1 SELECT * FROM

IF block DECLARE @Orderkey int, @Partkey int = 1 These are actually the stored

Dynamically Built Parameterized SQL DECLARE @Orderkey int, @Partkey int = 1 , @SQL nvarchar(500),

2 b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) =

Estimated versus Actual Plan - rows Estimated Plan – 1 row? ? ? Actual

3 Parameter Sniffing -- first call, procedure compiles with these parameters exec p_Report @startdate

4 Statistics • Auto-recompute points • Sampling strategy – How much to sample -

Statistics Structure • Stored (mostly) in binary field Scalar values Density Vector – limit

Statistics Auto/Re-Compute • Automatically generated on query compile • Recompute at 6 rows, 500,

Statistics Sampling • Sampling theory – True random sample – Sample error - square

Row Estimate Problems (at source) • Skewed data distribution • Out of bounds •

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers

Sql. Plan Compile Parameters <? xml version="1. 0" encoding="utf-8"? > <Show. Plan. XML xmlns="http:

AND – OR, IN / NOT IN, EXISTS / NOT EXISTS COMBINATIONS

5 a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY

5 a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with

5 a UNION (ALL) instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY

5 b AND/OR Combinations • Hash Join is good method to process many rows

Complex Queries • High Compile effort – Many joins, Many indexes – Estimated plan

Complex Query with Sub-expression • Query complexity – really high compile cost • Repeating

More Plan Details Query with joining 6 tables Each table has too many indexes

Temp Table and Table Variable • Forget what other people have said – Most

Row Estimate Error after Join IO – synchronous when estimate rows is < 25,

Parallelism • Designed for 1998 era – Cost Threshold for Parallelism: default 5 –

Parallel Execution – or not? Tables with computed columns using UDF prevent parallelism

Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed

varchar(max) stored in lob pages • Disk IO to lob pages is synchronous? –

legacy • API Server Cursors / Cursor Stored Procedures – sp_prepare / sp_prepexec, sp_execute,

Summary • Hardware today is really powerful – Storage may not be – SAN

SQL Server Edition Strategies • Enterprise Edition – per core licensing costs – Old

New Features in SQL Server • 2005 – Index included columns – Filtered index

SQL Performance General • Client-side architecture – Connection pooling – stored procedures versus SQL,

Client-side Architecture • Connection pooling: – Connection. Open, Execute, Connection. Close – Sp_reset_connection •

Database Architecture • • Normalization Cluster key Primary Key & other unique / natural

Principles Data Testing Server Storage Network

CPU & Memory 2001 versus 2014 x C 5 C 6 C 7 C

Work in progress MI LLC MI QPI C 4 C 3 C 2 LLC

Slides: 73

Download presentation

Modern Performance - SQL Server Joe Chang www. qdpma. com Jchang 6 @ yahoo

About Joe • • • SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003? ) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream – writing your own statistics • Disk IO cost structure • Tools for system monitoring, execution plan analysis See http: //www. qdpma. com/ Download: http: //www. qdpma. com/Exec. Stats. Zip. html Blog: http: //sqlblog. com/blogs/joe_chang/default. aspx

Overview • General SQL Server Performance • Why performance is still important today? – Brute force? • Yes, but … • Special Topics – spectacular fails • Automating data collections • SQL Server Engine – What developers/DBA need to know?

Not in this session • List of rules to be followed blindly • without consideration for the underlying reason • and whether rule actually applies in the current circumstance DBA skill: cause and effect analysis & assessment

Common Themes? • execution plan – Very large (multiple order of magnitude) error in row estimate • Single (execute) of large operation – Might still be tolerable • Multiple (executes) of large operations

CPU & Memory 2001 versus 2014 QPI C 4 C 3 C 2 LLC C 1 C 0 MI C 5 C 6 C 7 C 8 C 9 PCI-E 14 13 12 11 10 MI QPI PCI-E 14 13 12 11 10 MI PCI-E QPI C 5 C 6 C 7 C 8 C 9 PCI-E DMI x 4 x 4 PCH PCI-E GFX MC PCI-E Each core today is more than 10 x over Pentium III (700 MHz? ) PCI-E DMI 2 Xeon MP 2002 -4 PCI-E 14 13 12 11 10 MI QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E C 5 C 6 C 7 C 8 C 9 I 2001 – 4 sockets, 4 cores Pentium III Xeon, 900 MHz 4 -8 GB memory? QPI QP QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E 14 13 12 11 10 MI PCI-E MCH C 5 C 6 C 7 C 8 C 9 PCI-E FSB QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E P PCI-E P L 2 Xeon E 7 v 2 (Ivy Bridge), 15 cores, 3 QPI 4 x 15 = 60 cores 3 TB (96 x 32 GB) 24 DIMMs per socket 32 PCI-E gen 3 lanes + x 4 g 2 /socket Mem___2013 __ 2014 16 GB __ $191 __ $180 32 GB __ $794 __ $650 64 GB _____ __ $4510

CPU & Memory 2001 versus 2012 QPI PCI-E QPI MI PCI-E LLC MI C 4 C 5 C 6 C 7 QPI C 3 C 2 C 1 C 0 MI PCI-E C 3 C 2 C 1 C 0 QPI LLC PCI-E QPI MI C 4 C 5 C 6 C 7 MI PCI-E LLC MI C 4 C 5 C 6 C 7 MI PCI-E PCI-E PCI-E Each core today is more than 10 x over Pentium III (700 MHz? ) QPI MI C 3 C 2 C 1 C 0 DMI 2 Xeon MP 2002 -4 LLC C 4 C 5 C 6 C 7 PCI-E 2001 – 4 sockets, 4 cores Pentium III Xeon, 900 MHz 4 -8 GB memory? PCI-E MCH C 3 C 2 C 1 C 0 PCI-E FSB QPI PCI-E P PCI-E P L 2 Xeon E 5 (Sandy Bridge), 8 cores, 2 QPI 4 x 8 = 32 cores total Westmere-EX 1 TB (64 x 16 GB) (3 QPI) Sandy Bridge E 5: 768 GB (48 x 16 GB) (2 QPI) Mem___2013 __ 2014 16 GB __ $191 __ $180 32 GB __ $794 __ $650 64 GB _____ __ $4510

Intel E 5 & E 7 v 2 (Ivy-Bridge) E 3 v 3 GFX MC DMI x 4 x 4 PCH

Processor – Core

Microprocessor Pipeline 3 GHz 0. 33 ns clock 1 st BP IF ID RAT ROB Sch Exec Flags Retire 2 nd BP IF ID RAT ROB Sch Exec Flags Retire 5 ns from start to finish 200 MHz BP Microprocessor (core) is (multi-lane) assembly line Each core is superscalar Processor (socket) has multiple cores System has multiple sockets Branch Predict Instruction Fetch Decode Register Allocate & Rename Re-Ordering Buffer Schedule Execute Flags Retire

Micro-architecture Sandy-Bridge

Haswell (Xeon E 5/7 v 3)

CPU Access Times Logical 0 Logical 1 Core – 3. 33 GHz 1 CPU cycle = 0. 3 ns L 1 cache – 4 CPU clocks (1 ns) L 2 cache 12 CPU cycles (4 ns? ) L 1 I L 1 D L 2 Unified L 3 Slice DRAM L 3 cache 29+ cycles Local node memory 28 cycles + 49 ns (open page) 28 cycles + 56 ns (random page) Remote node (1 -hop) memory 28 + 100 ns 2 -hop 150 -300 ns+?

Latency Orders of Magnitude Core – 3. 33 GHz 1 CPU cycle = 0. 3 ns L 1 Cache L 1 cache – 4 CPU clocks (1 ns) L 1 Cache L 2 cache 12 CPU cycles (4 ns? ) L 3 cache 29+ cycles LLC Local node memory 28 cycles + 49 ns (open page) 28 cycles + 56 ns (random page) Remote node (1 -hop) memory 28 + 100 ns GFX MC DMI x 4 x 4 PCH 2 -hop 150 -300 ns+?

Westmere-EX 8 -Socket System QPI C 4 C 3 C 2 LLC QPI C 4 C 3 C 2 C 8 C 9 QPI MC C 1 C 0 MC MC MC QPI QPI LLC MC QPI MC C 4 C 3 C 2 C 1 C 0 MC QPI LLC C 5 C 6 C 7 C 8 C 9 MC QPI IOH 3 ESI C 8 C 9 QPI PCI-E x 4 PCI-E x 8 LLC C 5 C 6 C 7 QPI QPI MC SMB C 8 C 9 SMB C 1 C 0 C 5 C 6 C 7 This applies to the OS, SQL Server and the application SMB QPI MC MC MC C 4 C 3 C 2 C 8 C 9 C 1 C 0 C 8 C 9 QPI C 5 C 6 C 7 C 4 C 3 C 2 C 1 C 0 MC QPI LLC C 5 C 6 C 7 SMB QPI QPI SMB QPI IOH 2 MC C 4 C 3 C 2 Software developed without consideration for system architecture will likely have severe problems IOH 1 SMB QPI MC MC C 8 C 9 SMB LLC Large server systems are very complicated C 5 C 6 C 7 QPI C 5 C 6 C 7 C 8 C 9 C 1 C 0 LLC QPI C 1 C 0 C 4 C 3 C 2 QPI SMB QPI C 5 C 6 C 7 QPI C 1 C 0 IOH 0 C 4 C 3 C 2 QPI PCH

Storage 2001 versus 2012/13 QPI MCH HDD 10 Gb. E RAID SSD SSD HDD HDD HDD 2001 100 x 10 K HDD 125 IOPS each = 12. 5 K IOPS IO Bandwidth limited: 1. 3 GB/s (1/3 memory bandwidth) PCIe x 8 HDD HDD PCIe x 8 PCI RAID PCIe x 4 PCI RAID HDD 192 GB QPI IB 2013 64 SSDs, >10 K+ IOPS each, 1 M IOPS total possible 10 -20 GB/s+ IO Bandwidth easy 6. 4 GB/s on each PCIe G 3 x 8 SAN vendors – questionable BW http: //www. qdpma. com/Storage 2013. html http: //www. qdpma. com/ppt/Storage_2013. pptx

SAN Node 1 Node 2 768 GB Node 1 1024 GB Switch 10 K 7. 2 K Switch 8 Gb FC SP A SP B 24 GB x 4 SAS Auto-tier pools Hot Spares SSD 8 Gb FC or 10 Gb FCOE x 4 SAS 2 GB/s SSD x 8 24 GB x 8 SP B x 8 SSD 0. 8 GB/s SP A x 8 Switch 1024 GB x 8 HBA x 8 Switch SSD x 8 PCIe HBA Node 2 2 GB/s Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Data 16 SSD 1 SSD 2 SSD 3 SSD 4 Log 1 Log 2 Log 3 Log 4 http: //sqlblog. com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax. aspx http: //sqlblog. com/blogs/joe_chang/archive/2013/02/25/emc-vnx 2 -and-vnx-future. aspx

Performance Past, Present, Future • When will servers be so powerful that … – Been saying this for a long time • Today – 10 to 100 X overkill – 32 -cores in 2012, 60 -cores in 2014 – Enough memory that IO is only sporadic – Unlimited IOPS with SSD • What can go wrong? Today’s topic

SQL Performance SQL Tables natural keys Indexes Query Optimizer DOP Memory Parallel plans API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages Execution Plan Storage Engine Hardware Statistics Tables and SQL combined implement business logic Compile Natural keys with unique indexes, not SQL & Compile parameters Row estimate propagation errors Recompile temp table / table variable Index & Stats Maintenance Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Plan cache bloat? The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well

Factors to Consider SQL Tables Indexes Statistics Query Optimizer Storage Engine Hardware Compile Parameters DOP memory

Special Topics • Data type mismatch • Multiple Optional Search Arguments (SARG) – Function on SARG • • • Parameter Sniffing versus Variables Statistics related (big topic) OR, AND/OR combinations IN/NOT IN, EXISTS Complex Query with sub-expressions Parallel Execution Not in order of priority http: //blogs. msdn. com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries. aspx

1 a. Data type mismatch DECLARE @name nvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER WHERE C_NAME = @name Table column is varchar Parameter/variable is nvarchar SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name). NET auto-parameter discovery? Unable to use index seek

1 b. Type Mismatch – Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%' SELECT * FROM CUSTOMER WHERE C_NAME LIKE N’Customer#00000276%' Row estimate error could have severe consequences in a complex query

SELECT TOP + Row Estimate Error SELECT TOP 1000 [Document]. [Artifact. ID] FROM [Document] (NOLOCK) WHERE [Document]. [Access. Control. List. ID_D] IN (1, 1000064, 1000269) AND EXISTS ( SELECT [Document. Batch]. [Batch. Artifact. ID] FROM [Document. Batch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch]. Artifact. ID = [Document. Batch]. [Batch. Artifact. ID] WHERE [Document. Batch]. [Document. Artifact. ID] = [Document]. [Artifact. ID] AND [Batch]. [Name] LIKE N'%Value%' ) ORDER BY [Document]. [Artifact. ID] Data type mismatch – results in estimate rows high Top clause – easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows http: //www. qdpma. com/CBO/Relativity. html

MULTIPLE OPTIONAL SARG

2. Multiple Optional SARG DECLARE @Orderkey int, @Partkey int = 1 SELECT * FROM LINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)

IF block DECLARE @Orderkey int, @Partkey int = 1 These are actually the stored procedure parameters IF (@Orderkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) ELSE IF (@Partkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY = @Partkey) Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint

Dynamically Built Parameterized SQL DECLARE @Orderkey int, @Partkey int = 1 , @SQL nvarchar(500), @Param nvarchar(100) SELECT @SQL = N‘/* Comment */ SELECT * FROM LINEITEM WHERE 1=1‘ , @Param = N'@Orderkey int, @Partkey int' IF (@Orderkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey' IF (@Partkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey' PRINT @SQL exec sp_executesql @SQL, @Param, @Orderkey, @Partkey IF block is easier for few options Dynamically built parameterized SQL better for many options Consider /*comment*/ to help identify source of SQL

2 b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN '1995 -01 -01' AND '1995 -01 -31' DECLARE @Startdate, @Days int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd, 1, @Startdate)

Estimated versus Actual Plan - rows Estimated Plan – 1 row? ? ? Actual Plan – actual rows 77, 356

3 Parameter Sniffing -- first call, procedure compiles with these parameters exec p_Report @startdate = '2011 -01 -01', @enddate = '2011 -12 -31' -- subsequent calls, procedure executes with original plan exec p_Report @startdate = '2012 -01 -01', @enddate = '2012 -01 -07' Assuming date data type Need different execution plans for narrow and wide range Options: 1) OPTIMIZE FOR – one plan for all ranges 2) WITH RECOMPILE – compile on each execute 3) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers

STATISTICS

4 Statistics • Auto-recompute points • Sampling strategy – How much to sample - theory? – Random pages versus random rows – Histogram Equal and Range Rows – Out of bounds, value does not exist – etc. Statistics Used by the Query Optimizer in SQL Server 2008 Eric N. Hanson and Yavor Angelov, Contributor: Lubor Kollar Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator Joseph Sack http: //msdn. microsoft. com/en-us/library/dd 535534. aspx

Statistics Structure • Stored (mostly) in binary field Scalar values Density Vector – limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values http: //sqlblog. com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream. aspx

Statistics Auto/Re-Compute • Automatically generated on query compile • Recompute at 6 rows, 500, every 20%? Has this changed? 2008 R 2 Trace 2371 – lower threshold auto recomputed for large tables http: //support. microsoft. com/kb/2754171

Statistics Sampling • Sampling theory – True random sample – Sample error - square root N • Relative error 1/ N • SQL Server sampling – Random pages • But always first and last page? ? ? – All rows in selected pages

Row Estimate Problems (at source) • Skewed data distribution • Out of bounds • Value does not exist Row estimate errors at source – is classified under statistics topic

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TRepl. WS WHERE Hostname = @Server. Name But CAESIUM does not exist in TRepl. WS!

Good and Bad Plan?

Sql. Plan Compile Parameters

Sql. Plan Compile Parameters <? xml version="1. 0" encoding="utf-8"? > <Show. Plan. XML xmlns="http: //schemas. microsoft. com/sqlserver/2004/07/showplan" Version="1. 1" Build="10. 50. 2500. 0"> <Batch. Sequence> <Batch> <Statements> <Stmt. Simple Statement. Text="@Server. Name varchar(50) SELECT @maxid = ISNULL(MAX(id), 0) FROM TRepl. WS WHERE Hostname = @Server. Name" Statement. Id="1" Statement. Comp. Id="43" Statement. Type="SELECT" Statement. Sub. Tree. Cost="0. 0032843" Statement. Est. Rows="1" Statement. Optm. Level="FULL" Query. Hash="0 x 671 D 2 B 3 E 17 E 538 F 1" Query. Plan. Hash="0 x. EB 64 FB 22 C 47 E 1 CF 2" Statement. Optm. Early. Abort. Reason="Good. Enough. Plan. Found"> <Statement. Set. Options QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <Query. Plan Cached. Plan. Size="16" Compile. Time="1" Compile. CPU="1" Compile. Memory="168"> <Rel. Op Node. Id="0" Physical. Op="Compute Scalar" Logical. Op="Compute Scalar" Estimate. Rows="1" Estimate. IO="0" Estimate. CPU="1 e-007“ Avg. Row. Size="15" Estimated. Total. Subtree. Cost="0. 0032843" Parallel="0" Estimate. Rebinds="0" Estimate. Rewinds="0"> </Rel. Op> <Parameter. List> <Column. Reference Column="@Server. Name" Parameter. Compiled. Value="'CAESIUM'" /> </Parameter. List> </Query. Plan> </Stmt. Simple> </Statements> </Batch> </Batch. Sequence> </Show. Plan. XML> Compile parameter values at bottom of sqlplan file

AND – OR, IN / NOT IN, EXISTS / NOT EXISTS COMBINATIONS

5 a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY = 184826

5 a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099

5 a UNION (ALL) instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 UNION (ALL) SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates (with Sort operation) UNION ALL does not -- Hugo Kornelis trick --

5 b AND/OR Combinations • Hash Join is good method to process many rows – Requirement is equality join condition – SELECT xx FROM A WHERE col 1 IN (expr 1) AND col 2 NOT IN (expr 2) SELECT xx FROM A WHERE (expr 1) AND (expr 2 OR expr 3) • AND/OR, IN NOT IN, EXISTS NOT EXISTS combinations – Query optimizer may not be to determine that equality join condition exists – Execution plan will use loop join, – and attempt to force hash join will be rejected • Re-write using UNION in place of OR • And LEFT JOIN in place of NOT IN More on AND/OR combinations: http: //www. qdpma. com/CBO/Relativity 3. html

COMPLEX QUERIES

Complex Queries • High Compile effort – Many joins, Many indexes – Estimated plan cost correlation • Row estimation errors after multiple operations Row estimate errors at source – is classified under statistics topic

Complex Query with Sub-expression • Query complexity – really high compile cost • Repeating sub-expressions (including CTE) – Must be evaluated multiple times • Main Problem - Row estimate error propagation • Solution/Strategy – Get a good execution plan – Temp table when estimate is high, actual is low. When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work? More on AND/OR combinations: http: //www. qdpma. com/CBO/Relativity 4. html http: //blogs. msdn. com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries. aspx

More Plan Details Query with joining 6 tables Each table has too many indexes Row estimate is high – plan cost is high Query optimizer tries really hard to find better plan Actual rows is moderate, any plan works

Temp Table and Table Variable • Forget what other people have said – Most is cr@p • Temp Tables – subject to statistics auto/re-compile • Table variable – no statistics, assumes 1 row • Question: In each specific case: does the statistics and recompile help or not? – Yes: temp table – No: table variable Is this still true?

Row Estimate Error after Join IO – synchronous when estimate rows is < 25, asynchronous when > 25

Row Estimate 2

Parallelism • Designed for 1998 era – Cost Threshold for Parallelism: default 5 – Max Degree of Parallelism – instance level – OPTION (MAXDOP n) – query level • Today – complex system – 32 cores – Plan cost 5 query might run in 10 ms? – Some queries at DOP 4 Really need to rethink parallelism / NUMA strategies – Others at DOP 16? More on Parallelism: http: //www. qdpma. com/CBO/Parallelism. Comments. html http: //www. qdpma. com/CBO/Parallelism. Onset. html Tables with computed columns may inhibit parallelism? Number of concurrently running queries x DOP less than number of logical/physical processors?

Parallel Execution – or not? Tables with computed columns using UDF prevent parallelism

Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

varchar(max) stored in lob pages • Disk IO to lob pages is synchronous? – Must access row to get 16 byte link? – Feature request: index pointer to lob SQL PASS 2013 Understanding Data Files at the Byte Level Mark Rasmussen

legacy • API Server Cursors / Cursor Stored Procedures – sp_prepare / sp_prepexec, sp_execute, sp_unprepare – sp_cursoropen, sp_cursorfetch, sp_cursorclose – sp_cursorprepare / sp_cursorprepexec, sp_cursorexecute, sp_cursorunprepare • Guess which is not called? – Symptom: sp_reset_connection http: //technet. microsoft. com/en-us/library/ms 187088(v=sql. 105). aspx http: //technet. microsoft. com/en-us/library/ms 187801(v=sql. 120). aspx API Server Cursors Cursor Stored Procedures

Summary • Hardware today is really powerful – Storage may not be – SAN vendor disconnect • Standard performance practice – Top resource consumers, index usage • But also Look for serious blunders http: //www. qdpma. com/CBO/SQLServer. Cost. Based. Optimizer. html http: //www. qdpma. com/CBO/Relativity. html http: //blogs. msdn. com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries. aspx Kevin Boles – Common TSQL Mistakes

Thank you to our sponsors

Special Topics • Data type mismatch • Multiple Optional Search Arguments (SARG) – Function on SARG • • • Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query Parallel Execution

SQL Server Edition Strategies • Enterprise Edition – per core licensing costs – Old system strategy • 4 (or 2)-socket server, top processor, max memory – Today: How many cores are necessary • 2 socket system, max memory (16 GB DIMMs) • Is standard edition adequate – Low cost, but many important features disabled • BI edition – 16 cores – Limited to 64 GB for SQL Server process

New Features in SQL Server • 2005 – Index included columns – Filtered index – CLR • 2008 – Partitioning – Compression • 2012 – Column store (non-clustered) • 2014 – Column store clustered – Hekaton

General Performance GENERAL PERFORMANCE

SQL Performance General • Client-side architecture – Connection pooling – stored procedures versus SQL, parameterized • Database Architecture – Cluster key, primary key, natural keys, foreign keys • SQL – • Indexing • Indexes & Statistics Maintenance

Client-side Architecture • Connection pooling: – Connection. Open, Execute, Connection. Close – Sp_reset_connection • Stored procedures – parameterized SQL – Stored procedure name is short – Parameterized SQL may not be • Larger than 1 Ethernet packet? 2? , 8?

Database Architecture • • Normalization Cluster key Primary Key & other unique / natural keys Foreign keys

Principles Data Testing Server Storage Network

CPU & Memory 2001 versus 2014 x C 5 C 6 C 7 C 8 C 9 PCI-E 14 13 12 11 10 MI QPI PCI-E I QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E PCI-E DMI x 4 x 4 PCH PCI-E GFX MC PCI-E Each core today is more than 10 x over Pentium III (700 MHz? ) PCI-E DMI 2 Xeon MP 2002 -4 PCI-E 14 13 12 11 10 MI PCI-E C 5 C 6 C 7 C 8 C 9 QPI C 4 C 3 C 2 LLC C 1 C 0 MI QP 2001 – 4 sockets, 4 cores Pentium III Xeon, 900 MHz 4 -8 GB memory? QPI PCI-E QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E 14 13 12 11 10 MI PCI-E C 5 C 6 C 7 C 8 C 9 PCI-E MCH QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E FSB PCI-E P DMI 2 P L 2 Xeon E 7 v 2 (Ivy Bridge, 3 QPI) 4 x 15 = 60 cores 3 TB (96 x 32 GB) 24 DIMMs per socket 40 PCI-E gen 3 lanes + x 4 g 2 / socket Mem___2013 __ 2014 16 GB __ $191 __ $180 32 GB __ $794 __ $650

Work in progress MI LLC MI QPI C 4 C 3 C 2 LLC C 1 C 0 MI C 5 C 6 C 7 C 8 C 9 PCI-E E D C B C MI PCI-E C 4 C 5 C 6 C 7 C 8 C 9 PCI-E MI PCI-E QPI C 4 C 3 C 2 LLC C 1 C 0 MI PCI-E MI LLC C 4 C 5 C 6 C 7 QPI C 3 C 2 C 1 C 0 PCI-E QPI PCI-E PCI-E DMI 2 QPI PCI-E C 4 C 5 C 3 C 6 LLC C 2 C 7 C 1 C 8 MI MI PCI-E 14 13 12 11 10 MI