SESSION CODE WSV 202 Larry Mead CTO Platform

  • Slides: 33
Download presentation
SESSION CODE: WSV 202 Larry Mead – CTO Platform Modernization Team – Microsoft Rob

SESSION CODE: WSV 202 Larry Mead – CTO Platform Modernization Team – Microsoft Rob Shiveley – Data Center – Intel Scott Rosenbloom – Platform Strategy - Microsoft

Mission Critical* Vital function (such as production and sales) without which a firm cannot

Mission Critical* Vital function (such as production and sales) without which a firm cannot operate or remain viable. If a critical business function is interrupted, a firm could suffer serious financial, legal, or other damages or penalties. System attributes to support Mission Critical * Business Dictionary: http: //www. businessdictionary. com/definition/critical-business-function. html

Power Management • • Timer coalescing Tick skipping Core parking Report power consumption to

Power Management • • Timer coalescing Tick skipping Core parking Report power consumption to OS via ACPI • Accessible via WMI (reading/writing of power plans – active plan can be changed remotely) Virtualization • • SLAT VMQ Jumbo Frames Intel VT Scalability • 256 Logical Processors • Turbo Boost • Quickpath • 16 MB L 3 Cache (7400) • Multi-site manageability RAS • Memory Mirroring – writes to 2 locations to compensate for DRAM failure • Memory Sparing – predicts a failing DIMM and copies data to a spare DIMM • I/O Hot plug • MCA Recovery • WHEA – root cause

Built-In Redundancy & Failover Throughout the Platform Socket Redundancy & Failover Memory Redundancy &

Built-In Redundancy & Failover Throughout the Platform Socket Redundancy & Failover Memory Redundancy & Failover • Dynamic OS Assisted Processor Socket Migration* • Electronically Isolated (Static) Partitioning Memory NHM-EX Memory Intel® QPI IOH • Inter-socket Memory Mirroring • Intra-socket Memory Mirroring • Intel® SMI Lane Failover • Intel® SMI Clock Fail Over • Intel® SMI Packet Retry • Memory DIMM and Rank Sparing • Dynamic Memory Migration • Fail Over from Single DRAM Device Failure (SDDC) • Recovery from Single DRAM Device Failure (SDDC) plus random bit error IOH Intel® QPI Redundancy & Failover ICH 10 PCI Express* 2. 0 Intel® QPI = Intel® Quick. Path Interconnect Intel® SMI = Intel® Scalable Memory Interconnect PCI Express* 2. 0 • QPI Self-Healing • QPI Clock Fail Over • Intel QPI Packet Retry

Machine Check Architecture Recovery First Machine Check Recovery in Xeon®-based Systems Previously seen only

Machine Check Architecture Recovery First Machine Check Recovery in Xeon®-based Systems Previously seen only in RISC, mainframe, and Itanium-based systems DDR 3 DRAM System works in conjunction with OS or VMM to recover or restart processes and continue normal operation S DDR 3 DRAM REG DDR 3 DRAM DDR 3 DRAM SMI DDR 3 DRAM B Error information passed to OS / VMM S DDR 3 DRAM REG DRAM DDR 3 DRAM REG DDR 3 DRAM DDR 3 DRAM SMI With Error REG DRAM DDR 3 DRAM DDR 3 DRAM M Normal Status DDR 3 DRAM REG DDR 3 DRAM DDR 3 DRAM DDR 3 DRAM Un-correctable Error DDR 3 DRAM Error Detected* HW Correctable Errors DDR 3 DRAM Error DDR 3 DRAM REG Bad memory location flagged so data will not be used by OS or applications REG Prevention REG DRAM DDR 3 DRAM DDR 3 DRAM M B DDR 3 DRAM System Patrol Scrubber scans Recovery memory for errors with OS Error DDR 3 DRAM DDR 3 DRAM Contained Corrected Allows Recovery From Otherwise Fatal System Errors *Errors detected using Patrol Scrub or Explicit Write-back from cache 10

Introduced in Windows Server 2008* • Better root cause analysis – Error reporting via

Introduced in Windows Server 2008* • Better root cause analysis – Error reporting via common error record format, richer data content (e. g. FRU info) – Platform and the OS flows are well integrated which allows both to contribute information to the log • Better support for hardware error recovery – Built in infrastructure for error injection – Platform Specific Hardware Error Driver (PSHED) Plugins allow for platform participation in error recovery • Error avoidance with health monitoring – Allows for applications to register for hardware error event notification – PFA apps can be used to monitor platform health • WHEA enhancements on Intel® Architecture in Windows Server 2008* R 2 – Support for Nehalem-EX MCA recoverable errors – Corrected Machine Check Interrupt (CMCI) error handling support Intel® server processors codename Nehalem-EX

CPU Cores Core 0 Core 7 Broadcast MCE to all threads New Data Un.

CPU Cores Core 0 Core 7 Broadcast MCE to all threads New Data Un. Core WB Data LLC EWB Error detected Data stored with poison bit Poison Tag Memory WB Data Error detected Poison Tag Memory Controller Log the error MCi_Status. Valid = 1 MCi_Status. EN = 1 (Error enabled) MCi_Status. UC (uncorrected error ) = 1 MCi_Status. PCC (Process context corrupt ) = 0 MCi_Status. OVER (overflow) = 0 MCi_Status. MCA_error_codes indicates which error is detected MCG_Status. RIPV = 1 MCi_Status. ADDRV = 1 MCi_Status. MISCV = 1 MCi_Status. MSCOD = poison (model specific) Link System Software recovers the error

MCA Predictive Failure Notification Example: OS Initiates Fail-over to Spares CPU Cores Core 0

MCA Predictive Failure Notification Example: OS Initiates Fail-over to Spares CPU Cores Core 0 New Data Core 7 2 LLC Memroy Error detected Memory 13 Memory Error is Detected And Corrected 2 Corrected Error Count is Incremented 3 Error Count Exceeds Threshhold 4 Uncore Issues CMCI to the OS Handler 4 Un. Core 1 1 Memory Controller Link 3

Mission Critical Applications Business Applications Microsoft Server Applications Line Of Business (LOB) Custom Applications

Mission Critical Applications Business Applications Microsoft Server Applications Line Of Business (LOB) Custom Applications Enterprise Applications Database Collaboration Management Platform Communication Virtualization Platform Hyper-V™ Microsoft Virtualization = Windows Server 2008 R 2 Hyper-V + System Center

The Virtual / Process view Virtual Machine 1 Virtual Machine 3 Virtual Machine 1

The Virtual / Process view Virtual Machine 1 Virtual Machine 3 Virtual Machine 1 Hyper Visor Operating System The Physical / real view Physical Memory Pages

Scale Up Configuration SAN for SQL and Files SAN Fiber Optic channel to SAN

Scale Up Configuration SAN for SQL and Files SAN Fiber Optic channel to SAN LOB Apps Windows Server App Fabric SQL Server 2008 R 2 Windows Server 2008 R 2

Scale Out Configuration SAN based for SQL and Files SAN CICS COBOL apps Microapps

Scale Out Configuration SAN based for SQL and Files SAN CICS COBOL apps Microapps Focus Server EE CICS COBOL Windows Server 2008 Micro Focus Server EE LOB Apps Server 2008 Windows Server App Fabric HP BL 465 Windows Server 2008 R 2 HP BL 465 Fiber Optic channel to SAN SQL Server 2008 R 2 Windows Server 2008 R 2 Dual Gigabit Ethernet on PCIe bus

Scale Out Virtualized SAN based SQL and files SAN LOB Apps App Server Fiber

Scale Out Virtualized SAN based SQL and files SAN LOB Apps App Server Fiber Optic channel to SAN Windows Server 2008 R 2 LOB Apps SQL Server 2008 R 2 App Server Windows Server 2008 R 2 LOB Apps App Server Windows Server 2008 R 2 Hyper-V Virtualization Server Dual Gigabit Ethernet on PCIe bus

Backup & Disaster Recovery Performance and Health Monitoring Mobile Device Management Hardware Provisioning Deployment,

Backup & Disaster Recovery Performance and Health Monitoring Mobile Device Management Hardware Provisioning Deployment, Patching and State Mgmt Virtual Workload Provisioning

Sunguard bwin 1024 -Core Computing Grid running Windows Server 2008 and SQL Server 2008

Sunguard bwin 1024 -Core Computing Grid running Windows Server 2008 and SQL Server 2008 30, 000 Transactions per Second at peak Asset Liability management (ALM) - 1 Million bets per day Near Linear scalability 100 Terabytes of data Sunguard - http: //www. microsoft. com/casestudies/Case_Study_Detail. aspx? casestudyid=4000006391 bwin - http: //www. microsoft. com/casestudies/Case_Study_Detail. aspx? casestudyid=4000004138 Siemens - http: //www. microsoft. com/casestudies/Case_Study_Detail. aspx? casestudyid=4000004826 Siemens PLM system supports 5, 000 concurrent users Gained 50% of space through compression

Windows Server 2008 R 2 and SQL Server 2008 R 2 are mission critical

Windows Server 2008 R 2 and SQL Server 2008 R 2 are mission critical Hardware partners provide scale-up and resilient platform Windows Server + Intel Xeon 7500 can detect and recover from hardware errors Democratizing Mission Critical

Deploying, Virtualizing, and Managing Linux and UNIX with Hyper-V Manage Your Enterprise from a

Deploying, Virtualizing, and Managing Linux and UNIX with Hyper-V Manage Your Enterprise from a Single Seat: Windows Power. Shell Remoting Next Generation VDI with Microsoft Remote. FX Lighting Up Nehalem EX with Windows Server 2008 R 2 Implementing High Availability Windows Server 2008 R 2 Failover Clustering

www. microsoft. com/teched www. microsoft. com/learning http: //microsoft. com/technet http: //microsoft. com/msdn

www. microsoft. com/teched www. microsoft. com/learning http: //microsoft. com/technet http: //microsoft. com/msdn

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31 st http: //northamerica. msteched. com/registration You can also register at the North America 2011 kiosk located at registration Join us in Atlanta next year