MICROSOFT SQL SERVER DATABASE ENGINE IO by Bob
MICROSOFT SQL SERVER DATABASE ENGINE I/O by Bob Dorr, Microsoft SQL Server Principle Escalation Engineer, 1994 – Present Built: Jan 2008
Areas Covered Write Ahead Logging (WAL) Protocol Synchronous vs Asynchronous I/O Scatter / Gather I/O Sector alignment, Block Alignment Latching and a page: A read walk-through SQL Server I/O Sizes Data cache maintenance PAE and AWE Read Ahead User Mode and Kernel Mode (SYSTRAP) Sparse Files and Copy On Write (COW) Pages Locked Pages Scribbler(s) and Bit flips Page Protection and Constant Pages Checksum vs Torn Stale Read Stalled I/O
WAL Protocol Write Ahead Logging ACID (Durability Property) Log records secured before data Hardened / Stable Media Log contains parity bit • Commit • Rollback • Trigger Snapshot
Synchronous vs Asynchronous I/O Sync: Wait for Completion Async: Post and Continue � � � Overlapped Event Completion Port SQL Server � 98% Async Usage � Overlapped and Has. Overlapped. Io. Completed Network Layers Use Completion Port Backup/Restore Use Sync – Sequential Patterns • dm_io_pending_io_requests • Overlapped Structure • Async Processing ~= CPU • Package vs Phone
Scatter / Gather I/O Memory Consolidates or Distributes APIs � Read. File. Scatter � Write. File. Gather Scatter Gather Increases Efficiency Used by SQL I/O Paths Used by Windows Page File Disk • Old Design: 6. x Sorting • AWE Availability • Write. Multiple • # of 8 K Pages • Forward and Backward • Buffer Pool Ramp-up
Sector Alignment Block Alignment Sector: Log Writes Block: Performance Avoid Crossovers Disk. Part/Disk. Par Utilities Discuss with your Vendor Alignment: http: //support. microsoft. com/kb/929491 To verify that an existing partition is aligned, divide the size of the stripe unit by the starting offset of the RAID disk group. Use the following syntax: ((Partition offset) * (Disk sector size)) / (Stripe unit size) Example of alignment calculations in bytes for a 256 -KB stripe unit size: (63 * 512) / 262144 = 0. 123046875 (64 * 512) / 262144 = 0. 125 (128 * 512) / 262144 = 0. 25 (256 * 512) / 262144 = 0. 5 (512 * 512) / 262144 = 1 These examples shows that the partition is not aligned correctly for a 256 -KB stripe unit size until the partition is created by using an offset of 512 sectors (512 bytes per sector). • Double Touch • Rewrites • Defragment • 4 K Sectors
Latch Memory (Data Pages) Multiple Readers (SH) One Writer (EX) Protects In-Memory Data Page � � BUF Array Latch = Physical Protection Lock = Logical Protection User Mode UMS/SQLOS Aware Optimized FIFO Ordering BUF Status Latch Database* Page. Id Hash * … • Flushed & Rollback • Latch Timeout • Sub-latch
Reading A Page 0: 000> uf Zw. Write. File mov r 10, rcx mov eax, 5 Syscall Kernel Transition ret Get Free Buffer for Read Acquire Exclusive (EX) Latch Is already in-memory/hashed? Add Entry to Page Hash Post and Record Asynchronous Read … Continue Processing …. Check Status (Scheduler Switch) Complete: Validate I/O and Release Latch kernel transition – Stuck I/O? ntdll!Zw. Write. File+0 xa kernel 32!Write. File+0 xf 6 sqlservr!Disk. Write. Async+0 xee … • Page Audits • Read retry • Stalled I/O Warnings • Error raised at Acquire • Shared (SH) waiters • PAGE_IO* vs PAGE* Latch • Writing A Page
Myth: Single Worker Per File Truth: Each Worker Issues I/O Vol #1 db. Test. MDF db. Test. NDF Vol #2 Create Database Workers Assigned by Volume ID Worke r #4 Primary = db. Test. MDF Secondary = db. Test. NDF Log = db. Test. LDF Worke r #5 Worke r #1 Serial Plan select * from db. Test. dbo. tbl. Test insert into db. Test. dbo. tbl. Test Worke r #3 Worke r #2 Parallel Plan select * from db. Test. dbo. tbl. Test insert into db. Test. dbo. tbl. Test
Data Cache Maintenance Memory Pressure: Lazy. Writer � � Per NUMA Node Time Of Last Access (TLA) Recovery Interval: Checkpoint � � � Queue I/O Targets. LDF Usage Triggers Alternate Triggers (Backup, Restore, …) Scatter/Gather Usage (Write. Multiple) • Checkpoint Assignments • By Ordinal Sweep • Stalled I/O – LW #0 • I/O Queue Depth > 2
PAE and AWE Physical Address Extensions � � � /PAE in Boot. ini Boots Kernel with 36 bit addressing Physical Memory > 4 GB Virtual Address Unchanged (/2 gb or /3 GB) Automatic for Hot Add Memory Computers Address Windows Extension � � � Windows APIs (Allocate. User. Physical. Pages) Physical Memory Allocations Un/Mapped in or out of Virtual Address Range • 32 Bit Address = 4294967295 (0 x. FFFF) 4 GB • Interlocked Instruction lock xadd dword ptr [ecx], eax • 36 Bit Address = 68719476735 (0 x. FFFFF) 64 GB • Multiple Instructions • Data Pages-Only • Locked Pages • Windows Paging • Windows 2000 Bugs
Read Ahead 128 Pages Standard SKU 1024 Pages Enterprise SKU Uses Read. File. Scatter Plan Based Decisions Power of Asynchronous I/O • Read Over Write • Ramp-up
Sparse Files – Copy On Write Usage � Online DBCC � Snapshot Databases Buffer Pool: Prepare. To. Dirty File Control Block (FCB) Chaining • Sparse Allocation • FCB Tracking • Windows Limits • New Page Allocations
Advanced Protection What is a Scribbler? Data Page Audits � � � None Torn Bits Checksum Log Block Checksum Constant Page Backup with Checksum • DBCC Page Audit • Stale Read Check • SQLIOSim
REFERENCES
Overview SQL Server Always On http: //www. microsoft. com/sql/alwayson SQL Server I/O Basics Chapter 1 http: //www. microsoft. com/technet/prodtec hnol/sql/2000/maintain/sql. IObasics. mspx SQL Server I/O Basics Chapter 2 http: //www. microsoft. com/technet/prodtec hnol/sql/2005/iobasics. mspx
Fundamentals and Requirements KB 230785 - SQL Server 7. 0, SQL Server 2000 and SQL Server 2005 logging and data storage algorithms extend data reliability KB 917047 - Microsoft SQL Server I/O subsystem requirements for the tempdb database KB 231347 - SQL Server databases not supported on compressed volumes (except 2005 read only files)
Subsystems KB 917043 - Key factors to consider when evaluating third-party file cache systems with SQL Server KB 234656 - Using disk drive caching with SQL Server KB 46091 - Using hard disk controller caching with SQL Server KB 86903 - Description of caching disk controls in SQL Server KB 304261 - Description of support for network database files in SQL Server KB 910716 (in progress) - Support for third-party Remote Mirroring solutions used with SQL Server 2000 and 2005 KB 833770 - Support for SQL Server 2000 on i. SCSI technology components (applies to SQL Server 2005)
Design and Configuration White paper - Physical Database Layout and Design KB 298402 - Understanding How to Set the SQL Server I/O Affinity Option KB 78363 - When Dirty Cache Pages are Flushed to Disk White paper - Database Mirroring in SQL Server 2005 White paper - Database Mirroring Best Practices and Performance Considerations KB 910378 - Scalable shared database are supported by SQL Server 2005 MSDN article - Read-Only Filegroups KB 156932 - Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP
Diagnostics KB 826433 - Additional SQL Server Diagnostics Added to Detect Unreported I/O Problems KB 897284 - SQL Server 2000 SP 4 diagnostics help detect stalled and stuck I/O operations (applies to SQL Server 2005) KB 828339 - Error message 823 may indicate hardware problems or system problems in SQL Server KB 167711 - Understanding Bufwait and Writelog Timeout Messages KB 815436 - Use Trace Flag 3505 to Control SQL Server Checkpoint Behavior KB 906121 - Checkpoint resumes behavior that it exhibited before you installed SQL Server 2000 SP 3 when you enable trace flag 828 Web. Cast- Data Recovery in SQL Server 2005
Certification Policy KB 913945 - Microsoft does not certify that third-party products will work with Microsoft SQL Server KB 841696 - Overview of the Microsoft third-party storage software solutions support policy KB 231619 - How to use the SQLIOStress utility to stress a disk subsystem such as SQL Server
Utilities Download - SQLIO Disk Subsystem Benchmark Tool Download - SQLIOStress utility to stress disk subsystem (applies to SQL Server 7. 0, 2000, and 2005 - replaced with SQLIOSim and SQL Server 2008 installed in BINN)
Blog Content SQL Server Urban Legends Discussed http: //blogs. msdn. com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed. aspx How It Works: SQL Server Checkpoint (Flush. Cache) Outstanding I/O Target http: //blogs. msdn. com/psssql/archive/2008/04/11/how-it-works-sql-server-checkpoint-flushcache-outstanding-i-o-target. aspx How It Works: SQL Server Page Allocations http: //blogs. msdn. com/psssql/archive/2008/04/08/how-it-works-sql-server-page-allocations. aspx How It Works: Shapshot Database (Replica) Dirty Page Copy Behavior (New. Page) http: //blogs. msdn. com/psssql/archive/2008/03/24/how-it-works-shapshot-database-replica-dirty-page-copy-behavior-newpage. aspx How It Works: SQL Server 2005 I/O Affinity and NUMA Don't Always Mix http: //blogs. msdn. com/psssql/archive/2008/03/18/how-it-works-sql-server-2005 -i-o-affinity-and-numa-don-t-always-mix. aspx How It Works: Debugging SQL Server Stalled or Stuck I/O Problems - Root Cause http: //blogs. msdn. com/psssql/archive/2008/03/03/how-it-works-debugging-sql-server-stalled-or-stuck-i-o-problems-root-cause. aspx How It Works: SQL Server 2005 Database Snapshots (Replica) http: //blogs. msdn. com/psssql/archive/2008/02/07/how-it-works-sql-server-2005 -database-snapshots-replica. aspx How It Works: File Stream the Before and After Image of a File http: //blogs. msdn. com/psssql/archive/2008/01/15/how-it-works-file-stream-the-before-and-after-image-of-a-file. aspx Using SQLIOSim to Diagnose SQL Server Reported Checksum (Error 824/823) Failures http: //blogs. msdn. com/psssql/archive/2008/12/19/using-sqliosim-to-diagnose-sql-server-reported-checksum-error-824 -823 -failures. aspx How to use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem http: //support. microsoft. com/kb/231619 Should I run SQLIOSim? - An e-mail follow-up from SQL PASS 2008 http: //blogs. msdn. com/psssql/archive/2008/11/24/should-i-run-sqliosim-an-e-mail-follow-up-from-sql-pass-2008. aspx What do I need to know about SQL Server database engine I/O? http: //blogs. msdn. com/psssql/archive/2006/11/27/what-do-i-need-to-know-about-sql-server-database-engine-i-o. aspx SQLIOSim is "NOT" an I/O Performance Tuning Tool http: //blogs. msdn. com/psssql/archive/2008/04/05/sqliosim-is-not-an-i-o-performance-tuning-tool. aspx How It Works: SQLIOSim - Running Average, Target Duration, Discarded Buffers. . . http: //blogs. msdn. com/psssql/archive/2008/11/12/how-it-works-sqliosim-running-average-target-duration-discarded-buffers. aspx How It Works: SQLIOSim [Audit Users] and. INI Control File Sections with User Count Options http: //blogs. msdn. com/psssql/archive/2008/08/19/how-it-works-sqliosim-audit-users-and-ini-control-file-sections-with-user-countoptions. aspx Understanding SQLIOSIM Output
Additional Learning Resources Inside SQL Server 7. 0 and Inside SQL Server 2000 Written by Kalen Delaney – her husband is Paul Randle who wrote the core dbcc checks for SQL 7. 0, 2000 and 2005 The Guru’s Guide to SQL Server Architecture and Internals – ISBN 0201 -70047 -6 Written by Ken after he joined Microsoft SQL Server Support Many chapters reviewed by developers and folks like myself SQL Server 2005 Practical Troubleshooting ISBN 0 -321 -44774 -3 – Ken Henderson Authors of this book were key developers or support team members Cesar – QP developer and leader of the QP Red. Zone with Keithelm and Jackli Sameert – Developer of UMS and SQLOS Scheduler Santeriv – Developer of the lock manager Slavao – Developer of the SOS memory managers and engine architect Wei Xiao – Engine developer Bart Duncan – long time SQL EE and now developer of the Microsoft Data Warehouse – performance focused Bob Ward – SQL Server Support Senior EE Advanced Windows Debugging – ISBN 0 -321 -37446 Written by Microsoft developers – excellent resource Applications for Windows – Jeffrey Richter
- Slides: 24