DBI 331 SQL Server Warehousing Fast Track 4
DBI 331 SQL Server Warehousing (Fast Track 4. 0 & PDW) Stephen Strong, Matthew Winter
Fast Track Data Warehouse SQL Server 2012
OLTP – Tune your application (DW less control) Business Case - Budget Time to value
Step 1: Buy new shiny SAN array Step 2: Buy big server Step 3: Install SQL Server 2012 Step 4: Bulk load 30 TB DW with SSIS 30 TB Step 5: Grant access Users
Infrastructure Cost Large scans of huge fact tables Audience Survey – SAN throughput
DATA WAREHOUSING OFFERINGS WHERE FAST TRACK FITS IN Self Build Business Data Warehouse Fast Track Reference Architectures Parallel Data Warehouse Half Rack Parallel Data Warehouse Full Rack Effort to Build Variable Very low Moderate Very low Capacity Variable 5 TB 14 TB – 95 TB 80 TB 500 TB + Concurrency Variable Light – high Very high Query Complexity Variable Medium – high Very high Architecture SMP SMP MPP Form factor Software only Appliance Reference Architecture Appliance
Workload specific Software & System designs that help Customers accelerate their SQL Server 2012 projects. Software: • SQL Server 2012 Enterprise • Windows Server 2008 R 2 Database Configuration: • Workload Specific • Database Architecture • SQL Server settings • Windows Server settings • Performance Guidance Hardware System Design: • Tight specifications for servers, storage and networking • Resource Balanced & Validated • Latest generation Servers & Storage
Still Windows Server & SQL Server for admins & mgmt Can select from a range of HW vendors Engineered by expert from HW vendor and Microsoft Pre-tested for predictable results Faster implementation Great performance/price ratio Low SQL Server license count
Sequential Reads Parallel reads across multiple disk sets Balanced architecture Difficult to achieve in large server / share SAN environment Database layout matches hardware layout Matching throughput for each component LUNs mapped to Storage Processors, network paths, CPUs
Storage Processor 3 GB/sec 400 MB/sec each RAID 10 Storage Processor 3 GB/sec SQL Server 2012 1. 6 GB/sec per port Storage Enclosure Storage Processor 3 GB/sec Windows Server 2008 R 2 RAID 10 2 socket x 8 core 16 -cores total 5. 8 GB/sec total CPU HBA HBA Fiber Switch RAID 10 Storage Processor 3 GB/sec RAID 10 Storage Enclosure Storage Processor 3 GB/sec RAID 10 Storage Processor 3 GB/sec RAID 10 Storage Enclosure Server 6. 4 GB/sec total PCIe 3. 0 HBA Storage Processor 3 GB/sec RAID 10 Storage Processor 3 GB/sec RAID 10 Storage Enclosure 1. 6 GB/sec each 6. 4 GB/sec total
Take DL 380 FT 3. 0 hardware – 20 TB solution SQL Server 2008 R 2 – 2. 4 GB/sec benchmark queries SQL Server 2012 – 5 GB/sec with Column. Store Indexes No hardware changes No Solid State Drives 10 x – 100 x query improvement with Column. Store
Trace Flags DB file placement Tempdb: 8 -12 files max Index light – optimise for sequential scan throughput Capacity is based on Page and Column. Store compression Support for Always. On Failover Clustering
Cubes (SSAS) SSRS Source Systems Excel ETL (SSIS) Fast Track SQL Server 2012 SP 2010 Power View 3 rd Party
Use SQLIO for validation Aim for consistent MB/sec above published #s Take care with DB file placement On 30 TB+ configuration pay attention to SP-LUN mappings Simulate FT LUN layout in Dev environment Take advantage of Column. Store for extra performance Set MAXDOP to number of physical cores Use Resource Governor to limit memory/query to 10 -15%
Fast Track Data Warehouse SQL Server 2012 SQL Server Parallel Data Warehouse
What is Parallel Data Warehouse Scalable • • Scales from 10’s to 100’s of TB of Data Implemented as Massively Parallel Processing (MPP) System Standards Based • • Leverages Commodity Hardware Speaks SQL Server Language (T-SQL) Flexible • • Offers Hardware of Choice (HP/Dell) Supports Multiple Hardware Architectures Cost Effective • Low Price/TB Ratio
SQL Server Parallel Data Warehouse • Symmetric Multiprocessing (SMP) System • Massively Parallel Processing (MPP) System • ‘Shared Everything’ Architecture • ‘Shared Nothing’ Architecture • High Contention for System Resources (e. g. I/O, CPU) • Balanced Resource Utilization (e. g. I/O, CPU, Memory) • Extremely High Concurrency for Some Workloads • Higher Concurrency for Very Complex Workloads • Limited Scalability (up to 95 TB) • Excellent Scalability (up to 1 PB) • Requires Building High Availability • • Rich Functionality for OLTP and Data Warehousing Built-in High Availability and Redundancy • Tuned for Data Warehousing
Control Rack Data Rack Compute Nodes Storage Nodes Control Nodes SQL Active / Passive SQL Client Drivers SQL Support / Patching Landing Zone ETL Load Interface SQL SQL Backup Node Corporate Backup Solution Dual Infiniband Management Servers SQL Spare Compute Node Dual Fiber Channel SQL
Client Drivers SQL
Client Connections Always Go Through The Control Node Contains No Persistent User Data PDW “Secret Sauce” • • • Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server • Final Query Plan Processing • Result Aggregation Drivers • • • TDS Protocol (SQL Native 10) Wire Protocol (Se. Quel Link) ODBC, OLE-DB, ADO. NET and JDBC Client Drivers for both 32 -bit and 64 -bit
Management Node Support / Patching
Management Node Runs the Windows Domain Controller (Active Directory) Used for deploying patches to all nodes in the appliance Holds images in case a node needs reimaging
ETL Load Interface
Landing Zone Provides High-Capacity Storage for Data Files from ETL processes Integration Services available on the Landing Zone Connected to Internal Network Available as Sandbox for other Applications and Scripts that Run on Internal Network Source DWLOADER or SSIS Landing Zone Files Data Loader Compute Nodes
Corp. Backup Solution
Coordinated Backup across Nodes Database-Level Backup • • Full or Differential Metadata Backup Can Restore to a Larger Appliance Optional Item – 1 Size per Configuration • • Up to 654 TB of Capacity Orderable in Different Capacities
“PDW Node” Compute Node Storage node
Each MPP Node is a Highly Tuned SMP Node with Standard Interfaces Dedicated Hardware, Database, and Storage Running SQL Server 2008 Enterprise Edition SQL as Primary Interface
Compute Node Hardware Enterprise-Class DBMS Dual Multi-Core Processors Temp. DB Workspace CPU DUAL 4 Gb FC RAM CPU Dual Infini. Band Pre-Configured for Each SQL Server Instance on Each Compute Node. • • Drives Configured as RAID 1 to Avoid Appliance Failover for a Single-Drive Failure Configured with 3 LUNs (i. e. 3 RAID 1 Pairs) Temp. DB Used for the Following Purposes: • • • Sort-Work Area for Data Loading into Clustered Index Tables Spill Area for Hash Joins Not Fitting into Memory Temporary PDW Tables
Dual 8 GB Fiber Storage Processor Hot Spare Data/Log Drives (RAID 1) - LFF Dual Fiber Channel Controllers • • Active/Active Provides Fault Tolerance
User Database Small Form Factor Drives (SFF) (24) – 8 RAID 1 groups containing a single LUN Large Form Factor Drives (LFF) (11) – 4 RAID 1 groups containing 2 LUNs each LUN 1 LUN 2 LUN 3 LUN 4 LUN 5 LUN 6 LUN 7 LUN 8 LUN 9 Log LUN 9 SFF – 6 disk in a RAID 1 + 0 group LFF – 2 disk in a RAID 1 group SFF - 2 Hot Spares LFF – 1 Hot Spare (SFF)
User Database Storage Node – Physical File Layout LUN 1 LUN 2 LUN 3 LUN 4 LUN 5 LUN 6 LUN 7 LUN 8 FG UDB_Dist_A FG UDB_Dist_B FG UDB_Dist_C FG UDB_Dist_D FG UDB_Dist_E FG UDB_Dist_F FG UDB_Dist_G FG UDB_Dist_H UDBDist. ndf UDBDist. ndf UDBRepl. ndf UDB Replicated FG UDBRepl. ndf LUN 9 Log LUN 9 UDBRepl. ndf 8 Distribution Filegroups 1 Replicated Filegroup with 8 Files LUN 9 1 Log Filegroup
PDW Benefits – Massive Parallel Processing Control Rack Data Rack Storage Nodes Compute Nodes Active / Passive Query 1 Management Node Landing Zone Backup Node Dual Infiniband SQL ? ? ? ? ? SQL SQL SQL Spare Compute Node Dual Fiber Channel Control Nodes Query 1 is Submitted to SQL Server via the Control Node. Query is Executed on All 10 Nodes. Results are Sent Back to the Client.
Data Volume • • • 80 TB data warehouse analyzing data from exchanges Existing system based on SQL SMP farm 2 different clusters of 6 servers each Requirement • • • Linear scalability with additional hardware Support hourly loads with SSIS – 300 GB/day BI Integration: SSRS, SSAS and Power. Pivot Portal Reports ETL Dashboards AU 3 Feedback • • SP and increased T-SQL support was great Migrating SMP SSRS to PDW was painless 142 x for scan heavy queries & no summary tables Enabled queries that do not run on existing system Operational DB’s PDW Scorecards
Data Volume • • Currently 11 TB data warehouse analyzing data from transactional and clickstream sources (2 years data) Business need to expand to 7 year data window, i. e. ~39 TB. Requirement • • • Scalability - growing data volume does not affect performance Performance and ad-hoc analysis for interactive querying by users BI Integration with Microsoft BI stack - SSAS and SSRS AU 3 Feedback • • SSAS cubes worked ‘out-of-box’ Performance an order of magnitude faster than existing system (~30 x on an expanded data set) Click. Stream Nielsen PDW SSAS OLTP system
Demo Parallel Data Warehouse Loading Data Breaking the Marketing Numbers Matthew Winter Architect PDW & Appliance Co. E
10 X Performance Best-in-class Manageability Low Entry Cost More BI Tools, Any Language
Breakthrough Performance 10 X Faster Query Performance Drives Greater Productivity Support More Users with High Concurrency 6 X Improvement in Data Movement in the appliance Faster Throughput for Mixed Workloads
Best-in-Class Manageability Reduced operational cost through unified management of appliances and other SQL instances PDW Integration with Reduced cost of Upgrading from Scale Up SQL Server to MPP Cost
Low Entry Cost New Lower Cost Half Rack Solution Order From Your Preferred Vendor HP DELL
More BI Tools, Any Language Greater Support for Non. Microsoft BI & ETL Tools Use PDW on Data in Any Language SAS Hadoop Informatica Microstrategy SAP Business Objects International Data Support
Session Objectives • Scaling Microsoft SQL Server to manage the trend to analyse and report on increasing data volumes. Key Takeaways • • The Fast Track Reference Architecture provides a cost-effective approach to accelerate your data warehouse strategy. Providing a guide to building a reliable, scalable solution that will help you reduce your risk and save time. Parallel Data Warehouse is a preconfigured appliance based upon the proven SQL Server database technology. PDW is capable of scaling out as your data volumes grow, reducing the maintenance overhead to a minimum and provides the raw performance an Enterprise Data Warehouse requires.
Q&A Data Warehousing with SQL Server (Fast Track & PDW) Stephen Strong, Matthew Winter
- Slides: 46