Introducing Parallel Data Warehouse The project formerly known

Agenda The Typical problem with data warehouses MPP vs SMP SQL Server Parallel Data

Introducing Parallel Data Warehouse The Typical Problem with Data Warehouses 3

Microsoft DW Solutions Microsoft & Partner Services SSRS SSAS SSIS 11

Symmetric Multi-Processing vs. Massively Parallel Processing SMP MPP OLTP, Transactional, Data Warehousing (esp. VLDB,

PDW: No Assembly Required Software Servers Storage arrays Network switches Cables Licenses Power distribution

Basic Building Blocks Compute Nodes Handles the CPU cycles required to answer queries Storage

Anatomy of a Compute Node Pre-configured For Each SQL Server Instance On Each Compute

Anatomy of a Storage Node Pre-configured 4 RAID 10 Pairs for Primary User Data

More Node Types Backup node: Stores backup files from the appliance Can be logged

Putting It All Together - PDW Control Node Failover Protection: • Redundant Control Node

Software Architecture Internet Explorer Query Tool Control Node IIS Admin Console Data Movement Service

Create Database CREATE DATABASE database_name WITH ( AUTOGROW = ON , REPLICATED_SIZE = 1024

Distribution and Replication Database Distributed & Replicated Tables D Date Dim Customer C-CUSTOMER _SK

$Table Creation CREATE TABLE table_name [ ( { <column_definition> } [ , . .$

Create Table – Behind the Scenes Create Table store_sales with distribute_on (ss_item_sk) partition_on(ss_sold_date_sk )

Physical File Layout (Per Compute Node) 24

MPP Query Processing Control Node ODBC/JDBC SQL 92 with Analytical Extensions Select location, year

MPP Execution Plans The MPP engine creates parallel execution plans from client SQL The

Example Schema Sales table distributed on customer Date Dim Customer C-CUSTOMER _SK C_CUSTOMER _ID

Distribution Compatible Query SELECT Customer. Id, SUM(Amount) AS Total. Sales, SUM(Quantity) AS Total. Units.

MPP Query Plan Step 1 – On each compute node: SELECT s. [customerid], sum(s.

Query 1 Processing Flow Query Tool Control Node Compute Node 1 Parse SQL Data

Reshuffling the data SELECT Sale. Date, SUM(Amount) AS Total. Sales, SUM(Quantity) AS Total. Units.

MPP Query Plan Step 1 – Create temp table on control node CREATE TABLE

MPP Query Plan continued Step 3: SELECT [saledate], sum([totalsales]) AS totalsales, sum([totalunitssold]) AS totalunitssold

Reshuffling – Query Processing Flow Query Tool Compute Node Control Node Parse SQL Data

Data Loading Control Node Tables Are Hash Distributed Or Replicated Landing Zone Node Text

Data Loader Process Insert-Select Load File Sort each BATCH in memory or Temp. DB

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows Vista and other product names

Slides: 30

Download presentation

Introducing Parallel Data Warehouse (The project formerly known as Madison) Thomas Kejser Senior Program Manager Microsoft Corp.

Agenda The Typical problem with data warehouses MPP vs SMP SQL Server Parallel Data Warehouse Hardware architecture Query Processing Data Loading My email: tkejser@microsoft. com 2

Introducing Parallel Data Warehouse The Typical Problem with Data Warehouses 3

Microsoft DW Solutions Microsoft & Partner Services SSRS SSAS SSIS 11

Symmetric Multi-Processing vs. Massively Parallel Processing SMP MPP OLTP, Transactional, Data Warehousing (esp. VLDB, complex workloads) HW advancements increasing ability to scale-up HW advancements increasing ability to scaleout But scaling limited by design Scaling to 1 PB+ High end SMP very expensive Scale out is relatively low cost Extremely high concurrency for simple workloads Relatively high concurrency for complex workloads Less than 1 -2 TB of data SMP will almost always be better. > 2 TB up to 1 PB for DW workloads At higher sizes - depends 12

PDW: No Assembly Required Software Servers Storage arrays Network switches Cables Licenses Power distribution units Racks Comes fully assembled Software is installed at the factory Fully configured 13

Basic Building Blocks Compute Nodes Handles the CPU cycles required to answer queries Storage Nodes Stores data using Fiber Attached Disks. Scaled to support CPU with enough throughput Other nodes More about those later 14

Anatomy of a Compute Node Pre-configured For Each SQL Server Instance On Each Compute Node. Drives Configured As RAID 1 To Avoid Appliance Failover for a Single Drive Failure IBM Compute Nodes Will Have 1 Lun (1 RAID 1 Pair) Dell Compute Nodes Will Have 2 Lun’s (2 RAID 1 Pairs) HP Compute Nodes Will Have 3 Luns’s (3 RAID 1 Pairs) Temp. DB: Sort-work Area For Data Loading Into Clustered Index Tables Work Area for PDW Temporary Work Files Spill Area For Hash Joins Not Fitting Into Memory 15

Anatomy of a Storage Node Pre-configured 4 RAID 10 Pairs for Primary User Data 1 RAID 10 Pair for Database Logs 2 LUN’s Are Spread Across Each RAID Pair User Databases are Separate Physical SQL Server Databases Staging Database (Optional) Used for Loading & to Minimize Fragmentation 16

More Node Types Backup node: Stores backup files from the appliance Can be logged into by authorized Windows users Can be augmented with 3 rd party H/W and S/W Landing Zone: Used as a holding place for data to be loaded Can be logged into by authorized Windows users Can be augmented with 3 rd party H/W and S/W Management node: Runs the Windows domain controller (Active Directory) Used for deploying patches to all nodes in the appliance Holds images in case a node needs reimaging 17

Putting It All Together - PDW Control Node Failover Protection: • Redundant Control Node • Redundant Compute Node • Cluster Failover • Redundante Array of Inexpensive Databases Spare Node 18

Software Architecture Internet Explorer Query Tool Control Node IIS Admin Console Data Movement Service MS BI (AS, RS) DWSQL Other 3 rd Party Tools OLEDB, ODBC, ADO. Net, JDBC MPP Engine Compute Nodes Data Movement Service User Data SQL Server Landing Zone Node DW Authentication DW Configuration DW Schema Data Movement Service Temp. DB SQL Server 19

Create Database CREATE DATABASE database_name WITH ( AUTOGROW = ON , REPLICATED_SIZE = 1024 , DISTRIBUTED_SIZE = 16384 , LOG_SIZE = 300 ) 20

Distribution and Replication Database Distributed & Replicated Tables D Date Dim Customer C-CUSTOMER _SK C_CUSTOMER _ID C_CURRENT _ADDR … D_DATE _SK D_DATE _ID D_DATE D_MONTH … C I SS CD P S Item D C I_ITEM _SK I_ITEM _ID I_REC CD _START _D ATE I_ITEM _DESC Store Sales Ss_sold_date_sk Ss_item_sk Ss_customer_sk Ss_cdemo_sk Ss_store_sk Ss_promo_sk Ss_quantity … Customer Demographics CD_DEMO _SK CD_GENDER CD_MARITAL _STATUS CD_EDUCATION … … C I SS CD P S D C I SS Promotion SK S_STORE_ID S_REC_START_D ATE S_REC_END_DAT E S_STORE_NAME … P S D CD P_PROMO _SK P_PROMO _ID P_START _DATE _ Store I SS P_END _DATE _SK … P S D C I SS CD P S 21

$Table Creation CREATE TABLE table_name [ ( { <column_definition> } [ , . .$

Table Creation CREATE TABLE table_name [ ( { <column_definition> } [ , . . . n ] ) [ AS SELECT select_criteria ] [ WITH ( <table_option> ) ] [; ] <column_definition> : : = column_name <data_type> [ NULL | NOT NULL ] <data type> : : = type_name [ ( precision [ , scale ] ) ] <table_option> : : = { [ CLUSTER_ON ( column_name [ , . . . n ] ) ] , [ DISTRIBUTE_ON ( column_name ) ] | [ REPLICATE ] , [ PARTITION_ON column_name ( RANGE { LEFT | RIGHT } FOR VALUES { [ boundary_value [, . . . n] ] ) ) ] } Type Class Types Supported Integers tinyint, smallint, bigint Floating point float, real Character char, varchar, nvarchar Date & time date, time, dateime 2, datetimeoffset, timestamp, smalldatetime Fixed point decimal, money, smallmoney Binary binary, varbinary (8192) Other uniqueidentifier (? ) 22

Create Table – Behind the Scenes Create Table store_sales with distribute_on (ss_item_sk) partition_on(ss_sold_date_sk ) cluster_on (ss_sold_date_sk) 8 Filegroups (one per core) - 1 Table per Filegroup 12 Partitions (ss_sold_date_sk) 8 K 8 K 8 K N-number of Pages Row 23

Physical File Layout (Per Compute Node) 24

MPP Query Processing Control Node ODBC/JDBC SQL 92 with Analytical Extensions Select location, year sum(b. sales_amt) from customer a, sales b where b. sales > 500 and a. custid = b. custid group by 2, 1 order by 1, 2 Query Rewritten Into Steps That Run Efficiently On Compute Nodes Distribution-incompatible Joins Resolved Using High Speed Dynamic Re-distribution 25

MPP Execution Plans The MPP engine creates parallel execution plans from client SQL The plans can include the following types of operations: SQL operations: used to pass SQL directly to SQL Server on 1 or more nodes. DMS operations: used to move data among the nodes in an appliance for further processing. Temp tables operations: used to stage data for further processing. Return operations: push data back to the client. Simple plans may include just one type of operation. Complex plans may include all of these operations. Plans are executed serially, one step at a time. 26

Example Schema Sales table distributed on customer Date Dim Customer C-CUSTOMER _SK C_CUSTOMER _ID C_CURRENT _ADDR … D_DATE _SK D_DATE _ID D_DATE D_MONTH … . . . And partitioned by time Item I_ITEM _SK I_ITEM _ID I_REC _START _D ATE I_ITEM _DESC Store Sales Ss_sold_date_sk Ss_item_sk Ss_customer_sk Ss_cdemo_sk Ss_store_sk Ss_promo_sk Ss_quantity … Customer Demographics CD_DEMO _SK CD_GENDER CD_MARITAL _STATUS CD_EDUCATION … … Promotion P_PROMO _SK P_PROMO _ID P_START _DATE _ SK Store P_END _DATE _SK … S_STORE_SK S_STORE_ID S_REC_START_D ATE S_REC_END_DAT E S_STORE_NAME … 27

Distribution Compatible Query SELECT Customer. Id, SUM(Amount) AS Total. Sales, SUM(Quantity) AS Total. Units. Sold FROM Sales s JOIN Item i ON s. Item. Id = i. Item. Id WHERE Sale. Date BETWEEN '2009 -08 -01' AND '2009 -08 -31‘ AND Description LIKE '%gadgets%' GROUP BY Customer. Id ORDER BY Customer. Id; 28

MPP Query Plan Step 1 – On each compute node: SELECT s. [customerid], sum(s. [amount]) AS totalsales, sum(s. [quantity]) AS totalunitssold FROM [tpch_3]. [dbo]. [h_sales_34] s JOIN [tpch_3]. [dbo]. item_37 I ON (s. [itemid] = i. [itemid]) WHERE (s. [saledate] BETWEEN '2009 -08 -01' AND '2009 -08 -31' and i. [description] like '%gadgets%') GROUP BY s. [customerid] ORDER BY s. [customerid]; 29

Query 1 Processing Flow Query Tool Control Node Compute Node 1 Parse SQL Data Movement Service Validate & Authorize Build MPP Plan Data Movement Service User Data Execute Plan SQL Server Return Data to Client MPP Engine Compute Node N Data Movement Service DW Authentication DW Configuration DW Schema Temp. DB SQL Server User Data SQL Server 30

Reshuffling the data SELECT Sale. Date, SUM(Amount) AS Total. Sales, SUM(Quantity) AS Total. Units. Sold FROM Sales s JOIN Item i ON s. Item. Id = i. Item. Id WHERE Sale. Date BETWEEN '2009 -08 -01' AND '2009 -08 -31' AND Description LIKE '%gadgets%‘ GROUP BY Sale. Date ORDER BY Sale. Date; 31

MPP Query Plan Step 1 – Create temp table on control node CREATE TABLE [tempdb]. [dbo]. Q_[TEMP_ID_6760] ( saledate DATE, totalsales DECIMAL(38, 2), totalunitssold INTEGER ) WITH (DATA_COMPRESSION = PAGE); Step 2 – Run on each compute node SELECT s. [saledate], sum(s. [amount]) AS totalsales, sum(s. [quantity]) AS totalunitssold FROM [tpch_3]. [dbo]. [h_sales_34] s JOIN [tpch_3]. [dbo]. item_37 i ON (s. [itemid] = i. [itemid]) WHERE (s. [saledate] BETWEEN '2009 -08 -01' AND '2009 -08 -31' and i. [description] like '%gadgets%’) GROUP BY s. [saledate] 32

MPP Query Plan continued Step 3: SELECT [saledate], sum([totalsales]) AS totalsales, sum([totalunitssold]) AS totalunitssold FROM [tempdb]. [dbo]. Q_[TEMP_ID_6760] GROUP BY [saledate] ORDER BY [saledate] Step 4: DROP TABLE [tempdb]. [dbo]. Q_[TEMP_ID_6760]; 33

Reshuffling – Query Processing Flow Query Tool Compute Node Control Node Parse SQL Data Movement Service Validate & Authorize Build MPP Plan Data Movement Service Execute Plan Return Data to Client MPP Engine DW Authentication DW Configuration User Data DW Schema Compute Node Data Movement Service User Data Temp. DB SQL Server 34

Data Loading Control Node Tables Are Hash Distributed Or Replicated Landing Zone Node Text File Spare Node 35

Data Loader Process Insert-Select Load File Sort each BATCH in memory or Temp. DB Bulk Insert Phase Sort each partition In memory or Temp. DB Partitioned Staging Table (Heap) Partitioned Final Table (CIDX) Insert-Select Phase Trace Flags None Trace Flags 610 per NUMA Session BATCHSIZE Calculated MAXDOP 1 Per NUMA Session TABLOCK ON TABLOCK OFF Temp. DB Entire BATCHSIZE for Sort Temp. DB Entire PARTITION for sort Temp. DB Log Minimal Stage. DB Log Minimal User. DB Log Twice Data File Size ROLLBACK Commits per BATCHSIZE Rollback to last BATCH Only ROLLBACK Commits Full TRANSACTION Rollback Full TRANSACTION 36

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 37