Introduction to Azure SQL Data Warehouse SQLHammer Derik
- Slides: 21
Introduction to Azure SQL Data Warehouse @SQLHammer Derik Hammer www. sqlhammer. com 1
What is a data warehouse? • Very large amount of data • Centralizes data from dispersed systems • Loose RPO and RTOs • Optimized for data ingestion and large queries • Very low concurrency requirements • Used to build cubes and data marts • Used for data mining and analytics @SQLHammer www. sqlhammer. com 2
What is a data warehouse NOT? • Rows are frequently updated • Small RPO & RTO • Optimized for singleton reads & writes • Interacts with consumers through a series of small transactions • Supports hundreds or thousands of simultaneous connections • Used for real-time interaction with data @SQLHammer www. sqlhammer. com 3
SQL Data Warehouse is MPP What is MPP? • • • Massively parallel processing architecture Data distributed into slices Share nothing design is ideal Compute and storage can be coupled or de-coupled High speed network connection between cluster nodes • Each compute/storage node is like a complete symmetric multi-processing system (SMP) @SQLHammer www. sqlhammer. com 4
Symmetric Multi-Processing (SMP) @SQLHammer www. sqlhammer. com 5
MPP Architecture @SQLHammer www. sqlhammer. com 6
What is ADW? “Azure SQL Data Warehouse is the SQL analytics platform that lets you scale compute and storage elastically and independently, with a massively parallel processing architecture. ” - Microsoft • De-couples compute and storage to provide dynamic work load management • Platform as a Service (Paa. S) cloud offering • Pay for throughput (DWU), not hardware • Can query relational and non-relational data • Static distribution (60 slices at all times) • Can bypass control node and ingest data directly into the nodes, in parallel with Poly. Base @SQLHammer www. sqlhammer. com 7
ADW physical architecture @SQLHammer www. sqlhammer. com 8
ADW use cases • Large amounts of data: TBs to PBs • Query workloads which are heavy on aggregations and large table/range scans • You need to ingest relational and non-relational data • Your data is already in Azure • You have burstable or predictably schedulable workload peaks • You do not have strict RPO & RTO requirements • Low concurrency needs • You have a lot of money! • All MPP systems on the market are quite expensive @SQLHammer www. sqlhammer. com 9
ADW use cases (cont. ) • Recommended reading: Is Azure SQL Data Warehouse A Good Fit? – By Melissa Coates & Blue. Granite • 15 question flow chart @SQLHammer www. sqlhammer. com 10
ADW systems architecture @SQLHammer www. sqlhammer. com 11
ADW Scaling @SQLHammer www. sqlhammer. com 12
ADW Scaling (cont. ) @SQLHammer www. sqlhammer. com 13
ADW storage structures • Clustered columnstore indexes (CCI) • Recommended primary table structure for most tables • Non-clustered rowstore indexes (NCI) • Best when layered on top of CCIs • Used for singleton row retrievals • Recommended to limit the number to 1 or 2 per table, maximum • Clustered rowstore indexes (CI) • Best used before CCIs supported NCIs • Duplicate the table, one with CCI, and one with CI • Hard to maintain • Heaps @SQLHammer www. sqlhammer. com 14
ADW storage structures (cont. ) • Resides on Azure premium storage (SSDs) • Always divided into 60 slices • Hash distribution • Ideal and optimized for queries and joins • Incurs some write overhead due to hash function • Can cause data skew across slices • Round-robin distribution • Ideal for fast loads into a staging schema • Not optimized well for queries • Causes data shuffling when joining tables • Replicated • Ideal for small dimension tables (<150 k rows) • Table is cached on the first distribution of each compute node @SQLHammer www. sqlhammer. com 15
ADW performance tiers • Optimized for elasticity • Supports DWUs 100 through 6, 000 • Optimized for compute • Newer hardware supporting the compute nodes • NVMe SSD cache to keep data closer to the CPUs • Supports c. DWUs 1, 000 through 30, 000 • Microsoft aims to provide a linear increase in performance as DWUs are increased • DWUs directly relate to the maximum concurrency, memory allocations, and Poly. Base readers/writers @SQLHammer www. sqlhammer. com 16
Concurrency & work load management • Maximum of 1024 connections • Concurrent queries range from 4 to 32 concurrent queries, based on performance level • for now • Concurrency slots range from 4 to 240 • 4 dynamic resource classes (smallrc, mediumrc, largerc, and xlargerc) • 8 static resource classes (staticrc 10, staticrc 20, staticrc 30, staticrc 40, staticrc 50, staticrc 60, staticrc 70, staticrc 80) • Finer grain control @SQLHammer www. sqlhammer. com 17
Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 18
Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 19
Questions? @SQLHammer www. sqlhammer. com 20
Materials My Contact Information: @SQLHammer derik@sqlhammer. com All materials are available at: www. sqlhammer. com http: //www. sqlhammer. com/go/community/ Help me improve… Online feedback form: http: //www. sqlhammer. com/go/feedback This material has already been posted. When I update the material, the most recent updates will be available. 21
- Azure sql data warehouse
- Sqlhammer
- Azure sql data warehouse
- Azure sql data warehouse smp
- Koturače za dizanje tereta
- Azure sql advanced threat protection
- Parallel data warehouse sql server 2012
- Csp azure plan
- Introduction to data warehousing
- Introduction to data warehouse
- Introduction to data warehousing
- Cdc azure sql
- Azure sql database benchmark
- Azure horizontal scaling
- Azure sql server sla
- Azure sql sla
- Microsoft sql server raspberry pi
- Sql azure reporting
- Visualfox pro
- Logical server
- Azure clour
- Azure sql server stretch database