Loading Data Warehouses with Column Store Table Partitioning

Loading Data Warehouses with Column. Store, Table Partitioning, Merge, and SSIS Dr Greg Low SQL Down Under greg@sqldownunder. com

Who am I? • • • CEO & Principal Mentor of SQL Down Under Host of SQL Down Under Podcast Microsoft RD and MVP for SQL Server PASS Regional Mentor & Chapter Leader Author (books and whitepapers) Long-term SQL Server Junkie

Agenda • • MERGE Review Table Partitioning Review Column. Store Index Concepts Loading Data Warehouses

Why MERGE? • Large number of requests over many years for an UPSERT statement: IF EXISTS (code to locate a row) UPDATE the row ELSE INSERT the row

MERGE Statement • Multiple set operations in a single SQL statement MERGE target USING source ON. . . • Operations can be INSERT, UPDATE, DELETE but only on the target WHEN MATCHED WHEN [TARGET] NOT MATCHED WHEN SOURCE NOT MATCHED • ANSI SQL 2006 compliant - with extensions

MERGE and Determinism • UPDATE using a JOIN is non-deterministic – If more than one row in source matches ON clause • MERGE is deterministic – It’s an error

Using MERGE • Supports wide variety of table sources • Multiple WHEN clauses possible – For MATCHED and SOURCE NOT MATCHED – Only one WHEN clause for TARGET NOT MATCHED • MERGE statement causes triggers to be fired once • ROWCOUNT = total rows affected by all clauses • Can reference a $action column – Used with OUTPUT clause

MERGE Performance • MERGE statement is atomic • One Pass Through Tables – At most a full outer join – Inner join = WHEN MATCHED – Left-outer join = WHEN NOT MATCHED BY TARGET – Right-outer join = WHEN NOT MATCHED BY SOURCE

Demonstration 1 MERGE Statement

Agenda • • MERGE Review Table Partitioning Review Column. Store Index Concepts Loading Data Warehouses

Detail on Table Partitioning http: //msdn. microsoft. com/en-us/library/dd 578580. aspx

Why Partition Tables? • Some operations very slow (or impractical) on large tables – Load new data – Remove old data – Maintain indexes • Often small portion of table is active • Partitioning allows operations to be performed on part of a table

Partitioned Views • Horizontal Partitioning – Subsets of rows placed in different tables – View created over all the tables – View is queried rather than the table • Need to manually apply constraints and logic – Can get complex

Vertical Partitioning • Columns spread across multiple tables – Same number of rows in each table – Different columns in each table except for primary key – View created across the tables (usually many joins) – Query view rather than table • No built-in SQL Server support – But not too difficult to set up – Sparse columns might provide a better outcome

Partitioned Tables • Special kind of table – Partition function determines partition for a row – Partition scheme maps partitions to filegroups – Table/Indexes created on the partition scheme • Advantages – Loading, deleting, archiving operations can be fast – Queries with filter predicates that align with table partitions use partition elimination

Lock Escalation • Can escalate locks to the partition level • Escalation mode can be controlled ALTER TABLE <table_name> SET (LOCK_ESCALATION = [TABLE | AUTO |DISABLE]);

Read-Only Filegroups • If a single partition is active, other partitions can be placed on read-only filegroups – No need to apply logs to recover a read-only filegroup – Older data protected from accidental mods • Locks still taken – Inactive data safe to read with NOLOCK

Table Compression • Compression options applied at partition level • Example: – Page compression for older data – Row compression for current data

Partition Switching • Many "whole of table data " operations become fast metadata operations – Switching in a new partition already holding data – Switching out an entire partition of data

Using Partitioned Tables • SQL Server does most of the work – Manages placement of data – Applications see the table as a standard table – Queries that have filters that match the partition structure can be much faster

Demonstration 2 • Table Partitioning

Agenda • • MERGE Review Table Partitioning Review Column. Store Index Concepts Loading Data Warehouses

Column. Store Indexes • Introduced in SQL Server 2012 – Implemented as a new type of nonclustered index – New batch query processing mode – Optimizer still makes a cost-based decision – Based on similar Vertipaq technology to Power. Pivot • Aim: Provide near instant query response times even with ad hoc queries on large data sets – Avoid need for summary tables, indexed views, cubes – Fewer indexes to design, create, and maintain – Reduce need to manually tune queries

Column. Store Index Structure Traditional Indexes Column. Store Indexes Col 1 Col 2 Segments -> Row Group -> I/O is reduced by only returning required columns – compressed segments are retrieved Parallelism is increased Col 3 Col 4 Col 5 Col 6

Concepts & Current Limitations • Segments stored in LOB storage • Only nonclustered Column. Store indexes permitted – – Underlying table structure cannot be Column. Store Clustered index or heap still required Cannot be created on an indexed view Cannot be filtered • Only single Column. Store index per table – Other nonclustered indexes are permitted – Often all columns in the table added to Column. Store index – Column. Store index must be partition-aligned

Datatype Restrictions • Limited data types for columns in Column. Store index – – – – – char and varchar nchar and nvarchar (except varchar(max) and nvarchar(max)) decimal and numeric (precision up to 18 digits only) int, bigint, smallint, and tinyint float and real bit money and smallmoney All date and time data types datetimeoffset with scale up to 2

Biggest Limitation While a columnstore index is present on the table, the data in the table cannot be altered

So How Do I Update My Table? • Option 1 – Disable Column. Store. Index – Load data – Rebuild Column. Store. Index • Option 2 – Switch in/out table partitions • Option 3 – Union small table with larger table with Column. Store index

Demonstration 3 • Column. Store Indexes

Agenda • • MERGE Review Table Partitioning Review Column. Store Index Concepts Loading Data Warehouses

Typical Uses for Column. Store in DW • Large fact tables – Might make sense on some very large dimensions • Read-mostly workload – Updates mostly appending new data – Rarely updating existing data • Queries should benefit from Column. Store index – Typical star join queries – Scanning and aggregating large amounts of data

Most Common Case – Sliding Windows • Table structured by date – Usually one active partition • Process – Switch out active partition – Disable Column. Store index on staging table – MERGE new data into staging table – Rebuild Column. Store index on staging table – Switch staging table back in as active partition

Demonstration 4 • Loading Data Warehouses

Automation We've seen what needs to be done Q: What's the best way to do this? A: SSIS !

SQL Mug ! Chuck Norris gets NULL. Nothing compares to him either. That's not a database. This is a database. ON, or WHERE, that is the question ALTER MUG SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DENY CONTROL ON OBJECT: : MUG TO public; knock who's there? sp_who? spid 1, spid 2, spid 3, spid 4. . . § ALTER DATABASE Critical. DB SET Chuck. Norris. Mode = ON Clever Geeky Message Here WITH NOWAIT; § § §

Session Summary • Column. Store Index has big potential • Need a plan to work around limitations Thanks! greg@sqldownunder. com http: //www. sqldownunder. com

Questions? Please complete an evaluation form for this session …and thanks again to our awesome sponsors!