Introduction to Azure SQL Data Warehouse SQLHammer Derik

  • Slides: 27
Download presentation
Introduction to Azure SQL Data Warehouse @SQLHammer Derik Hammer www. sqlhammer. com 1

Introduction to Azure SQL Data Warehouse @SQLHammer Derik Hammer www. sqlhammer. com 1

Data Warehouse Basics @SQLHammer www. sqlhammer. com 2

Data Warehouse Basics @SQLHammer www. sqlhammer. com 2

Data warehouse A data warehouse is not A centralized repository of data from dispersed

Data warehouse A data warehouse is not A centralized repository of data from dispersed systems A system where rows are frequently updated A system with loose RPOs and RTOs Optimized for singleton reads and writes Optimized for data ingestion Meant to interact with consumers through small transactions Optimized for large queries Capable of support hundreds of thousands of connections A system with low concurrency requirements Used for real-time interaction with data Used to build physical data marts and cubes Used for data mining Used for analytics @SQLHammer www. sqlhammer. com 3

Azure SQL DW is an MPP system What is MPP? • Massively parallel processing

Azure SQL DW is an MPP system What is MPP? • Massively parallel processing architecture • Data distributed into slices • Share nothing design is ideal • Compute and storage can be coupled or de-coupled • High speed network connection between cluster nodes • Each compute/storage node is like a complete symmetric multi-processing system (SMP) @SQLHammer www. sqlhammer. com 4

Symmetric Multi-Processing (SMP) @SQLHammer www. sqlhammer. com 5

Symmetric Multi-Processing (SMP) @SQLHammer www. sqlhammer. com 5

MPP Architecture @SQLHammer www. sqlhammer. com 6

MPP Architecture @SQLHammer www. sqlhammer. com 6

Azure SQL DW @SQLHammer www. sqlhammer. com 7

Azure SQL DW @SQLHammer www. sqlhammer. com 7

SQL DW vision • Any size • Pay for what you store • Any

SQL DW vision • Any size • Pay for what you store • Any performance • Pay for what you use • No data warehouse workload is too small @SQLHammer www. sqlhammer. com 8

What is SQL DW? “Azure SQL Data Warehouse is the SQL analytics platform that

What is SQL DW? “Azure SQL Data Warehouse is the SQL analytics platform that lets you scale compute and storage elastically and independently, with a massively parallel processing architecture. ” - Microsoft • De-couples compute and storage to provide dynamic work load management • Static distribution (60 slices at all times) • Platform as a Service (Paa. S) cloud offering • Pay for linearly scalable throughput (DWU), not hardware • Can query relational and non-relational data • Can bypass control node and ingest data directly into the nodes, in parallel with Poly. Base @SQLHammer www. sqlhammer. com 9

SQL DW use cases (cont. ) • Recommended reading: Is Azure SQL Data Warehouse

SQL DW use cases (cont. ) • Recommended reading: Is Azure SQL Data Warehouse A Good Fit? – By Melissa Coates & Blue. Granite • 15 question flow chart @SQLHammer www. sqlhammer. com 10

SQL DW use cases • Large amounts of data: TBs to PBs • Query

SQL DW use cases • Large amounts of data: TBs to PBs • Query workloads which are heavy on aggregations and large table/range scans • You need to ingest relational and non-relational data • Your data is already in Azure • You have burstable or predictably schedulable workload peaks • You do not have strict RPO & RTO requirements • Low concurrency needs • You have a lot of money! • All MPP systems on the market are quite expensive @SQLHammer www. sqlhammer. com 11

Azure SQL DW Architecture @SQLHammer www. sqlhammer. com 12

Azure SQL DW Architecture @SQLHammer www. sqlhammer. com 12

SQL DW physical architecture @SQLHammer www. sqlhammer. com 13

SQL DW physical architecture @SQLHammer www. sqlhammer. com 13

SQL DW systems architecture @SQLHammer www. sqlhammer. com 14

SQL DW systems architecture @SQLHammer www. sqlhammer. com 14

Elastic Scale @SQLHammer www. sqlhammer. com 15

Elastic Scale @SQLHammer www. sqlhammer. com 15

SQL DW Scaling @SQLHammer www. sqlhammer. com 16

SQL DW Scaling @SQLHammer www. sqlhammer. com 16

SQL DW Scaling (cont. ) @SQLHammer www. sqlhammer. com 17

SQL DW Scaling (cont. ) @SQLHammer www. sqlhammer. com 17

Tables and Storage Structures @SQLHammer www. sqlhammer. com 18

Tables and Storage Structures @SQLHammer www. sqlhammer. com 18

SQL DW storage structures • Clustered columnstore indexes (CCI) • Recommended primary table structure

SQL DW storage structures • Clustered columnstore indexes (CCI) • Recommended primary table structure for most tables • Non-clustered rowstore indexes (NCI) • Best when layered on top of CCIs • Used for singleton row retrievals • Recommended to limit the number to 1 or 2 per table, maximum • Clustered rowstore indexes (CI) • Best used before CCIs supported NCIs • Duplicate the table, one with CCI, and one with CI • Hard to maintain • Heaps @SQLHammer www. sqlhammer. com 19

SQL DW storage structures (cont. ) • Resides on Azure premium storage (SSDs) •

SQL DW storage structures (cont. ) • Resides on Azure premium storage (SSDs) • Always divided into 60 slices • Hash distribution • Ideal and optimized for queries and joins • Incurs some write overhead due to hash function • Can cause data skew across slices • Round-robin distribution • Ideal for fast loads into a staging schema • Not optimized well for queries • Causes data shuffling when joining tables • Replicated • Ideal for small dimension tables (<150 k rows) • Table is cached on the first distribution of each compute node @SQLHammer www. sqlhammer. com 20

Performance Tiers and Workload Management @SQLHammer www. sqlhammer. com 21

Performance Tiers and Workload Management @SQLHammer www. sqlhammer. com 21

SQL DW performance tiers • Optimized for elasticity • Supports DWUs 100 through 6,

SQL DW performance tiers • Optimized for elasticity • Supports DWUs 100 through 6, 000 • Optimized for compute • Newer hardware supporting the compute nodes • NVMe SSD cache to keep data closer to the CPUs • Supports c. DWUs 1, 000 through 30, 000 • Microsoft aims to provide a linear increase in performance as DWUs are increased • DWUs directly relate to the maximum concurrency, memory allocations, and Poly. Base readers/writers @SQLHammer www. sqlhammer. com 22

Concurrency & work load management • Maximum of 1024 connections • Concurrent queries range

Concurrency & work load management • Maximum of 1024 connections • Concurrent queries range from 4 to 32 concurrent queries, based on performance level • for now • Concurrency slots range from 4 to 240 • 4 dynamic resource classes (smallrc, mediumrc, largerc, and xlargerc) • 8 static resource classes (staticrc 10, staticrc 20, staticrc 30, staticrc 40, staticrc 50, staticrc 60, staticrc 70, staticrc 80) • Finer grain control @SQLHammer www. sqlhammer. com 23

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 24

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 24

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 25

Concurrency & work load management (cont. ) • Examples: @SQLHammer www. sqlhammer. com 25

Questions? @SQLHammer www. sqlhammer. com 26

Questions? @SQLHammer www. sqlhammer. com 26

Materials My Contact Information: @SQLHammer derik@sqlhammer. com All materials are available at: www. sqlhammer.

Materials My Contact Information: @SQLHammer derik@sqlhammer. com All materials are available at: www. sqlhammer. com http: //www. sqlhammer. com/go/community/ Help me improve… Online feedback form: http: //www. sqlhammer. com/go/feedback This material has already been posted. When I update the material, the most recent updates will be available. 27