Azure SQL Data Warehouse What Where and How

  • Slides: 34
Download presentation
Azure SQL Data Warehouse: What, Where and How Sergey Lunyakin Sergey. Lunyakin@gmail. com @slunyakin

Azure SQL Data Warehouse: What, Where and How Sergey Lunyakin Sergey. Lunyakin@gmail. com @slunyakin

Sponsors

Sponsors

About me

About me

Agenda • • • What is Azure SQL Data Warehouse? Scale Up vs. Scale

Agenda • • • What is Azure SQL Data Warehouse? Scale Up vs. Scale Out (SMP vs. MPP) Architecture of Azure SQL DW Appropriate workloads Loading data Pricing and Limitation

What is Azure SQL Data Warehouse? • • A Platform as a Service in

What is Azure SQL Data Warehouse? • • A Platform as a Service in Microsoft Azure cloud It’s a Massively Parallel Processing system (MPP) Distributed Compute and Distributed Storage Scale up and down in couple minutes Pause compute resources Supports a subset of T-SQL Join with external data in Azure Blob Storage

Scale Up vs. Scale Out (SMP vs. MPP) SMP - Symmetric Multiprocessing MPP -

Scale Up vs. Scale Out (SMP vs. MPP) SMP - Symmetric Multiprocessing MPP - Massively Parallel Processing • • Multiple CPUs used to complete individual processes simultaneously All CPUs share the same memory, disks, and network controllers (scale-up) All SQL Server implementations up until now have been SMP Mostly, the solution is housed on a shared SAN • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes

Architecture of Azure SQL DW

Architecture of Azure SQL DW

Architecture Azure SQL DW … … Dist_DB_60 … https: //azure. microsoft. com/enus/documentation/articles/sql-datawarehouse-overview-what-is Dist_DB_30 Dist_DB_46

Architecture Azure SQL DW … … Dist_DB_60 … https: //azure. microsoft. com/enus/documentation/articles/sql-datawarehouse-overview-what-is Dist_DB_30 Dist_DB_46 Dist_DB_47 … … Dist_DB_15 Dist_DB_16 Dist_DB_17 … Dist_DB_1 Dist_DB_2

Storage Compute Logical Overview Control

Storage Compute Logical Overview Control

Distributions • Distribution – SQL Database which stores one or more distributed table •

Distributions • Distribution – SQL Database which stores one or more distributed table • Splits data table to 60 buckets through compute nodes • Hash distributed table * • Round-Robin distributed table * * Selecting the right distribution method is key for good performance

Hash distributed • • Data divided across nodes based on hashing algorithm Same value

Hash distributed • • Data divided across nodes based on hashing algorithm Same value will always hash to same distribution Optimal for large fact tables Data Skew can be an issue when distributing on high frequency values. Use this to look for skew: DBCC PDW_SHOWSPACEUSED('dbo. Your. Table. Name');

Round-Robin distributed • Data distributed evenly across nodes • Easy place to start, don’t

Round-Robin distributed • Data distributed evenly across nodes • Easy place to start, don’t need to know anything about the data • Useful for dimension tables and tables without a good hash column • Will incur more data movement at query time

Storage Compute Distributed queries Control

Storage Compute Distributed queries Control

Data Warehouse Units (DWUs) • DWUs are measured of resources like CPU, Memory and

Data Warehouse Units (DWUs) • DWUs are measured of resources like CPU, Memory and IOPS, which are allocated to Azure SQL DW instance. • DWU 100 - 6000 • How many compute nodes your instance uses • How many concurrent queries your instance can run. • How many parallel readers you can have for data loads • The size limit of transactions

DEMO

DEMO

Appropriate workloads

Appropriate workloads

Analytical workloads • Store large volumes of data. Supports a maximum compressed size 240

Analytical workloads • Store large volumes of data. Supports a maximum compressed size 240 TB, that potentially is a Petabyte uncompressed data. • Consolidate disparate data into a single location • Shape, model, transform and aggregate data • Perform batch processing query across large datasets • Ad-hoc reporting across large data volume

Unsuitable workloads OLTP workloads • High frequency reads&writes • Large numbers of singleton selects

Unsuitable workloads OLTP workloads • High frequency reads&writes • Large numbers of singleton selects • High volumes of single row inserts Procedural ETL • Row by row processing needs • Incompatible formats (JSON, XML)

Compute Consumption

Compute Consumption

Azure Data Lake & SQL DW

Azure Data Lake & SQL DW

Source for SSAS Polybase

Source for SSAS Polybase

Loading data

Loading data

Loading data tools • • • SSIS BCP Azure Data Factory SQL Bulk. Copy

Loading data tools • • • SSIS BCP Azure Data Factory SQL Bulk. Copy Poly. Base 3 rd Party tools

Loading data not Polybase https: //blogs. msdn. microsoft. c om/sqlcat/2016/02/06/azuresql-data-warehouse-loadingpatterns-and-strategies/

Loading data not Polybase https: //blogs. msdn. microsoft. c om/sqlcat/2016/02/06/azuresql-data-warehouse-loadingpatterns-and-strategies/

Loading data via Polybase https: //blogs. msdn. microsoft. com /sqlcat/2016/02/06/azure-sqldata-warehouse-loading-patterns -and-strategies/

Loading data via Polybase https: //blogs. msdn. microsoft. com /sqlcat/2016/02/06/azure-sqldata-warehouse-loading-patterns -and-strategies/

Migration data tools • Data Warehouse Migration tool (Microsoft) • Data Platform Studio (Red.

Migration data tools • Data Warehouse Migration tool (Microsoft) • Data Platform Studio (Red. Gate)

DEMO

DEMO

Pricing • Compute - From $1. 21/hour to $72. 58/hour • Storage - $0.

Pricing • Compute - From $1. 21/hour to $72. 58/hour • Storage - $0. 17/1 TB/hour (Premium storage) • Data storage includes the size of your data warehouse and 7 days of incremental snapshot • Storage transactions are not billed • Geo-redundant copies is billed at Standard Disk RAGRSs of $0. 12/GB/month.

Limitation • Only 32 concurrent queries • How do you get around this? Create

Limitation • Only 32 concurrent queries • How do you get around this? Create more instances. • Max compressed size is 240 TB • Potentially 1 PB uncompressed • No support for the following • Primary/Foreign Keys • Identity • Computed Columns • Triggers • Sequences • Cursors • MERGE https: //azure. microsoft. com/en-us/documentation/articles/sql-datawarehouse-migrate-code/

Start using § https: //www. visualstudio. com/dev-essentials/ - $25 § https: //azure. microsoft. com/en-us/offers/ms-azr-0044

Start using § https: //www. visualstudio. com/dev-essentials/ - $25 § https: //azure. microsoft. com/en-us/offers/ms-azr-0044 p/ -$200 § https: //azure. microsoft. com/en-us/services/sql-datawarehouse/extended-trial/ § PASSSUMMITROCKS – 30 days, DWU 200

Summary • • MPP Paa. S Service in Azure Cloud Storing and processing huge

Summary • • MPP Paa. S Service in Azure Cloud Storing and processing huge amount of structure data Flexible Compute Scale, DWUs 100 -6000 Hash/Round-Robin table distributions Batch processing – Yes, Row processing – No T-SQL with limitations Use Poly. Base for data loading No CAPEX, Low OPEX

Links • https: //azure. microsoft. com/en-us/documentation/services/sqldata-warehouse/ • https: //azure. microsoft. com/en-us/pricing/details/sql-datawarehouse/ • https: //azure.

Links • https: //azure. microsoft. com/en-us/documentation/services/sqldata-warehouse/ • https: //azure. microsoft. com/en-us/pricing/details/sql-datawarehouse/ • https: //azure. microsoft. com/en-us/documentation/articles/sqldata-warehouse-migrate-code/ • https: //blogs. msdn. microsoft. com/sqlcat/2016/08/18/migratingdata-to-azure-sql-data-warehouse-in-practice/ • https: //blogs. msdn. microsoft. com/sqlcat/2016/02/06/azure-sqldata-warehouse-loading-patterns-and-strategies/

Sponsors

Sponsors

Thank you! Sergey. Lunyakin@gmail. com

Thank you! Sergey. Lunyakin@gmail. com