Introduction to Azure SQL Data Warehouse SQLHammer Derik
- Slides: 27
Introduction to Azure SQL Data Warehouse @SQLHammer Derik Hammer www. sqlhammer. com 1
Data Warehouse Basics @SQLHammer www. sqlhammer. com 2
Data warehouse A data warehouse is not A centralized repository of data from dispersed systems A system where rows are frequently updated A system with loose RPOs and RTOs Optimized for singleton reads and writes Optimized for data ingestion Meant to interact with consumers through small transactions Optimized for large queries Capable of support hundreds of thousands of connections A system with low concurrency requirements Used for real-time interaction with data Used to build physical data marts and cubes Used for data mining Used for analytics @SQLHammer www. sqlhammer. com 3
Azure SQL DW is an MPP system What is MPP? • Massively parallel processing architecture • Data distributed into slices • Share nothing design is ideal • Compute and storage can be coupled or de-coupled • High speed network connection between cluster nodes • Each compute/storage node is like a complete symmetric multi-processing system (SMP) @SQLHammer www. sqlhammer. com 4
Symmetric Multi-Processing (SMP) @SQLHammer www. sqlhammer. com 5
MPP Architecture @SQLHammer www. sqlhammer. com 6
Azure SQL DW @SQLHammer www. sqlhammer. com 7
SQL DW vision • Any size • Pay for what you store • Any performance • Pay for what you use • No data warehouse workload is too small @SQLHammer www. sqlhammer. com 8
What is SQL DW? “Azure SQL Data Warehouse is the SQL analytics platform that lets you scale compute and storage elastically and independently, with a massively parallel processing architecture. ” - Microsoft • De-couples compute and storage to provide dynamic work load management • Static distribution (60 slices at all times) • Platform as a Service (Paa. S) cloud offering • Pay for linearly scalable throughput (DWU), not hardware • Can query relational and non-relational data • Can bypass control node and ingest data directly into the nodes, in parallel with Poly. Base @SQLHammer www. sqlhammer. com 9
SQL DW use cases (cont. ) • Recommended reading: Is Azure SQL Data Warehouse A Good Fit? – By Melissa Coates & Blue. Granite • 15 question flow chart @SQLHammer www. sqlhammer. com 10
SQL DW use cases • Large amounts of data: TBs to PBs • Query workloads which are heavy on aggregations and large table/range scans • You need to ingest relational and non-relational data • Your data is already in Azure • You have burstable or predictably schedulable workload peaks • You do not have strict RPO & RTO requirements • Low concurrency needs • You have a lot of money! • All MPP systems on the market are quite expensive @SQLHammer www. sqlhammer. com 11
Azure SQL DW Architecture @SQLHammer www. sqlhammer. com 12
SQL DW physical architecture @SQLHammer www. sqlhammer. com 13
SQL DW systems architecture @SQLHammer www. sqlhammer. com 14
Elastic Scale @SQLHammer www. sqlhammer. com 15
SQL DW Scaling @SQLHammer www. sqlhammer. com 16
SQL DW Scaling (cont. ) @SQLHammer www. sqlhammer. com 17
Tables and Storage Structures @SQLHammer www. sqlhammer. com 18
SQL DW storage structures • Clustered columnstore indexes (CCI) • Recommended primary table structure for most tables • Non-clustered rowstore indexes (NCI) • Best when layered on top of CCIs • Used for singleton row retrievals • Recommended to limit the number to 1 or 2 per table, maximum • Clustered rowstore indexes (CI) • Best used before CCIs supported NCIs • Duplicate the table, one with CCI, and one with CI • Hard to maintain • Heaps @SQLHammer www. sqlhammer. com 19
SQL DW storage structures (cont. ) • Resides on Azure premium storage (SSDs) • Always divided into 60 slices • Hash distribution • Ideal and optimized for queries and joins • Incurs some write overhead due to hash function • Can cause data skew across slices • Round-robin distribution • Ideal for fast loads into a staging schema • Not optimized well for queries • Causes data shuffling when joining tables • Replicated • Ideal for small dimension tables (<150 k rows) • Table is cached on the first distribution of each compute node @SQLHammer www. sqlhammer. com 20
Performance Tiers and Workload Management @SQLHammer www. sqlhammer. com 21
SQL DW performance tiers • Optimized for elasticity • Supports DWUs 100 through 6, 000 • Optimized for compute • Newer hardware supporting the compute nodes • NVMe SSD cache to keep data closer to the CPUs • Supports c. DWUs 1, 000 through 30, 000 • Microsoft aims to provide a linear increase in performance as DWUs are increased • DWUs directly relate to the maximum concurrency, memory allocations, and Poly. Base readers/writers @SQLHammer www. sqlhammer. com 22
Concurrency & work load management • Maximum of 1024 connections • Concurrent queries range from 4 to 32 concurrent queries, based on performance level • for now • Concurrency slots range from 4 to 240 • 4 dynamic resource classes (smallrc, mediumrc, largerc, and xlargerc) • 8 static resource classes (staticrc 10, staticrc 20, staticrc 30, staticrc 40, staticrc 50, staticrc 60, staticrc 70, staticrc 80) • Finer grain control @SQLHammer www. sqlhammer. com 23
Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 24
Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 25
Questions? @SQLHammer www. sqlhammer. com 26
Materials My Contact Information: @SQLHammer derik@sqlhammer. com All materials are available at: www. sqlhammer. com http: //www. sqlhammer. com/go/community/ Help me improve… Online feedback form: http: //www. sqlhammer. com/go/feedback This material has already been posted. When I update the material, the most recent updates will be available. 27
- Azure sql data warehouse
- Azure sql data warehouse mpp
- Azure sql analytics
- Azure sql data warehouse smp
- Derik kran
- Sql threat protection
- Microsoft sql server 2012 parallel data warehouse
- New commerce experience
- Operational and informational data store in data warehouse
- Introduction to data mining and data warehousing
- Introduction of data warehouse
- Cdc azure
- Azure dtu calculator
- Azure sql database sharding
- Azure sql db sla
- Azure sql sla
- Raspberry pi mssql server
- Sql azure reporting
- Visualfox pro
- Logical server
- Azure clour
- Azue sql server stretch monitoring
- Sql
- Azure ml normalize data
- Azure cosmos db: sql api deep dive online courses
- Azure database security
- Azure sql graph
- Azure sql dtu limits