Introduction to Azure SQL Data Warehouse SQLHammer Derik

  • Slides: 21
Download presentation
Introduction to Azure SQL Data Warehouse @SQLHammer Derik Hammer www. sqlhammer. com 1

Introduction to Azure SQL Data Warehouse @SQLHammer Derik Hammer www. sqlhammer. com 1

What is a data warehouse? • Very large amount of data • Centralizes data

What is a data warehouse? • Very large amount of data • Centralizes data from dispersed systems • Loose RPO and RTOs • Optimized for data ingestion and large queries • Very low concurrency requirements • Used to build cubes and data marts • Used for data mining and analytics @SQLHammer www. sqlhammer. com 2

What is a data warehouse NOT? • Rows are frequently updated • Small RPO

What is a data warehouse NOT? • Rows are frequently updated • Small RPO & RTO • Optimized for singleton reads & writes • Interacts with consumers through a series of small transactions • Supports hundreds or thousands of simultaneous connections • Used for real-time interaction with data @SQLHammer www. sqlhammer. com 3

SQL Data Warehouse is MPP What is MPP? • • • Massively parallel processing

SQL Data Warehouse is MPP What is MPP? • • • Massively parallel processing architecture Data distributed into slices Share nothing design is ideal Compute and storage can be coupled or de-coupled High speed network connection between cluster nodes • Each compute/storage node is like a complete symmetric multi-processing system (SMP) @SQLHammer www. sqlhammer. com 4

Symmetric Multi-Processing (SMP) @SQLHammer www. sqlhammer. com 5

Symmetric Multi-Processing (SMP) @SQLHammer www. sqlhammer. com 5

MPP Architecture @SQLHammer www. sqlhammer. com 6

MPP Architecture @SQLHammer www. sqlhammer. com 6

What is ADW? “Azure SQL Data Warehouse is the SQL analytics platform that lets

What is ADW? “Azure SQL Data Warehouse is the SQL analytics platform that lets you scale compute and storage elastically and independently, with a massively parallel processing architecture. ” - Microsoft • De-couples compute and storage to provide dynamic work load management • Platform as a Service (Paa. S) cloud offering • Pay for throughput (DWU), not hardware • Can query relational and non-relational data • Static distribution (60 slices at all times) • Can bypass control node and ingest data directly into the nodes, in parallel with Poly. Base @SQLHammer www. sqlhammer. com 7

ADW physical architecture @SQLHammer www. sqlhammer. com 8

ADW physical architecture @SQLHammer www. sqlhammer. com 8

ADW use cases • Large amounts of data: TBs to PBs • Query workloads

ADW use cases • Large amounts of data: TBs to PBs • Query workloads which are heavy on aggregations and large table/range scans • You need to ingest relational and non-relational data • Your data is already in Azure • You have burstable or predictably schedulable workload peaks • You do not have strict RPO & RTO requirements • Low concurrency needs • You have a lot of money! • All MPP systems on the market are quite expensive @SQLHammer www. sqlhammer. com 9

ADW use cases (cont. ) • Recommended reading: Is Azure SQL Data Warehouse A

ADW use cases (cont. ) • Recommended reading: Is Azure SQL Data Warehouse A Good Fit? – By Melissa Coates & Blue. Granite • 15 question flow chart @SQLHammer www. sqlhammer. com 10

ADW systems architecture @SQLHammer www. sqlhammer. com 11

ADW systems architecture @SQLHammer www. sqlhammer. com 11

ADW Scaling @SQLHammer www. sqlhammer. com 12

ADW Scaling @SQLHammer www. sqlhammer. com 12

ADW Scaling (cont. ) @SQLHammer www. sqlhammer. com 13

ADW Scaling (cont. ) @SQLHammer www. sqlhammer. com 13

ADW storage structures • Clustered columnstore indexes (CCI) • Recommended primary table structure for

ADW storage structures • Clustered columnstore indexes (CCI) • Recommended primary table structure for most tables • Non-clustered rowstore indexes (NCI) • Best when layered on top of CCIs • Used for singleton row retrievals • Recommended to limit the number to 1 or 2 per table, maximum • Clustered rowstore indexes (CI) • Best used before CCIs supported NCIs • Duplicate the table, one with CCI, and one with CI • Hard to maintain • Heaps @SQLHammer www. sqlhammer. com 14

ADW storage structures (cont. ) • Resides on Azure premium storage (SSDs) • Always

ADW storage structures (cont. ) • Resides on Azure premium storage (SSDs) • Always divided into 60 slices • Hash distribution • Ideal and optimized for queries and joins • Incurs some write overhead due to hash function • Can cause data skew across slices • Round-robin distribution • Ideal for fast loads into a staging schema • Not optimized well for queries • Causes data shuffling when joining tables • Replicated • Ideal for small dimension tables (<150 k rows) • Table is cached on the first distribution of each compute node @SQLHammer www. sqlhammer. com 15

ADW performance tiers • Optimized for elasticity • Supports DWUs 100 through 6, 000

ADW performance tiers • Optimized for elasticity • Supports DWUs 100 through 6, 000 • Optimized for compute • Newer hardware supporting the compute nodes • NVMe SSD cache to keep data closer to the CPUs • Supports c. DWUs 1, 000 through 30, 000 • Microsoft aims to provide a linear increase in performance as DWUs are increased • DWUs directly relate to the maximum concurrency, memory allocations, and Poly. Base readers/writers @SQLHammer www. sqlhammer. com 16

Concurrency & work load management • Maximum of 1024 connections • Concurrent queries range

Concurrency & work load management • Maximum of 1024 connections • Concurrent queries range from 4 to 32 concurrent queries, based on performance level • for now • Concurrency slots range from 4 to 240 • 4 dynamic resource classes (smallrc, mediumrc, largerc, and xlargerc) • 8 static resource classes (staticrc 10, staticrc 20, staticrc 30, staticrc 40, staticrc 50, staticrc 60, staticrc 70, staticrc 80) • Finer grain control @SQLHammer www. sqlhammer. com 17

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 18

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 18

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 19

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 19

Questions? @SQLHammer www. sqlhammer. com 20

Questions? @SQLHammer www. sqlhammer. com 20

Materials My Contact Information: @SQLHammer derik@sqlhammer. com All materials are available at: www. sqlhammer.

Materials My Contact Information: @SQLHammer derik@sqlhammer. com All materials are available at: www. sqlhammer. com http: //www. sqlhammer. com/go/community/ Help me improve… Online feedback form: http: //www. sqlhammer. com/go/feedback This material has already been posted. When I update the material, the most recent updates will be available. 21