SQL Server 2012 Columnstore Indexes Flipping the DW

  • Slides: 45
Download presentation
SQL Server 2012: Columnstore Indexes Flipping the DW /faster Bit: A Fast-Paced Introduction to

SQL Server 2012: Columnstore Indexes Flipping the DW /faster Bit: A Fast-Paced Introduction to an Exciting New Feature Jimmy May, MCM MSIT Principal Architect: Database jimmymay@microsoft. com http: //blogs. msdn. com/jimmymay @aspiringgeek 9/6/2021 1

Bio Jimmy May, MCM MSIT Principal Architect: Database Formerly: Sr. Program Manager, SQL CAT

Bio Jimmy May, MCM MSIT Principal Architect: Database Formerly: Sr. Program Manager, SQL CAT SQL Server Customer Advisory Team Microsoft Certified Master: SQL Server (2009) MS IT Gold Star Recipient (2008) Microsoft Oracle Center of Excellence (2008) SQL Server MVP Nominee (2006) Indiana Windows User Group www. iwug. net Founder & Board of Directors Indianapolis Professional Association for SQL Server www. indypass. org Founder & Member of Executive Committee SQL Server Pros: Founder & Visionary-in-Chief SQL Innovator Award Recipient (2006) Contest sponsored in part by Microsoft Formerly Chief Database Architect for high-throughput OLTP VLDB at Exact. Target (recently IPO) Senior Database Administrator for Open. Globe/Escient jimmymay@microsoft. com www. twitter. com/aspiringgeek http: //blogs. msdn. com/jimmymay

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1.

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1. Overview 2. Architecture 3. Scenarios 4. More Info 3

C: >sqlservr. exe /faster 4

C: >sqlservr. exe /faster 4

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1.

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1. Overview 2. Architecture 3. Scenarios 4. More Info 5

6 SQL Server 2012 New Features q Always. On § § q q q

6 SQL Server 2012 New Features q Always. On § § q q q Always. On Availability Groups Several other features: http: //msdn. microsoft. com/en-us/library/cc 645581%28 v=SQL. 110%29. aspx Indirect Checkpoints § Smooth out I/O spikes from checkpoints § Server-level metadata stored in the database (e. g. , users) § § § 1/hour > 625 days 1/day > 40 years 1/month > 1, 250 years § More in a moment… Contained Databases Support for 15, 000 partitions Columnstore Indexes SQL Server 2012 What’s New Whitepaper http: //download. microsoft. com/download/E/9/D/E 9 DBB 1 BB-89 AE-4 C 70 -AF 02 AAFC 29451 A 85/SQL_Server_2012_Whats_New_White_Paper. pdf

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1.

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1. Overview 2. Architecture 3. Scenarios 4. More Info 7

What is a SQL Columnstore Index? q Codename Apollo q Part of “x. Velocity”

What is a SQL Columnstore Index? q Codename Apollo q Part of “x. Velocity” in-memory family of technologies Common codepath with Verti. Paq § Power. Pivot, Power. View, SSAS § q Contrasted with traditional row stores in which data is physically stored row-by-row q Columnstores values for all rows for a given column 8

Columnstore High-Level Characteristics q Highly compressed q Vector-based query execution q Aggressive readahead q

Columnstore High-Level Characteristics q Highly compressed q Vector-based query execution q Aggressive readahead q Processes data in units called "batches“ q In-memory structures q Query Optimizer automatically considers columnstore indexes during compilation q Numerous deep engine modifications § I/O, Memory, & Caching 9

Why Use Columnstore Indexes? q Designed to optimize access to large DWs (vs. OLTP)

Why Use Columnstore Indexes? q Designed to optimize access to large DWs (vs. OLTP) § Star schema, large fact tables (esp. integer keys), aggregations, scans, reporting q Faster, interactive query response time q Transparent to the application § Most things “just work” Backup and restore § Mirroring, log shipping § SSMS, etc. § There are some gotchas (stay tuned…) § 10

Why Use Columnstore? (cont. ) q Reduced physical DB design effort § Fewer conventional

Why Use Columnstore? (cont. ) q Reduced physical DB design effort § Fewer conventional indexes § Reduced need for summary aggregates and indexed views § May eliminate need for OLAP cubes q Lower TCO § Yes, I had to say this! 11

Star Schema Fact. Sales (Customer. Key int, Product. Key int, Employee. Key int, Store.

Star Schema Fact. Sales (Customer. Key int, Product. Key int, Employee. Key int, Store. Key int, Dim. Employee Order. Date. Key int, Sales. Amount money) --note: lots of ints in fact tables Dim. Customer (Customer. Key int, First. Name nvarchar(50), Last. Name nvarchar(50), Birthdate, Email. Address nvarchar(50)) Dim. Product… Dim. Store Dim. Date Dim. Customer

Gotchas q Some queries, even the schema might have to be modified to fully

Gotchas q Some queries, even the schema might have to be modified to fully leverage q Read only, not writable § For now q Work-around: “Trickle-loading” § Large fact table § New data is loaded in real-time in smaller staging table § Query via UNION ALL q Enterprise Edition only q Not yet available in Azure 13

Syntax Similar to Existing DML CREATE NONCLUSTERED COLUMNSTORE INDEX ix_cs_My. DWTable ON dbo. My.

Syntax Similar to Existing DML CREATE NONCLUSTERED COLUMNSTORE INDEX ix_cs_My. DWTable ON dbo. My. DWTable (col 1 , col 2 , . . . , coln); DROP INDEX dbo. My. DWTable. ix_cs_My. Table; ALTER INDEX dbo. My. DWTable. ix_cs_My. Table DISABLE; ALTER INDEX dbo. My. DWTable. ix_cs_My. Table ENABLE; 14

Hints to Force / Prevent Usage q Force the Optimizer to use: …FROM dbo.

Hints to Force / Prevent Usage q Force the Optimizer to use: …FROM dbo. My. DWTable WITH (INDEX (ix_cs_My. Table))… q Prevent the Optimizer from using: SELECT… FROM dbo. My. DWTable WHERE… GROUP BY… ORDER BY… OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX); 15

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1.

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1. Overview 2. Architecture 3. Scenarios 4. More Info 16

17 How Do Columnstore Indexes Optimize Perf? … C 1 C 2 C 3

17 How Do Columnstore Indexes Optimize Perf? … C 1 C 2 C 3 C 4 q Heaps, B-trees store data row-wise q Columnstore indexes store data column-wise § Each page stores data from a single column q Highly compressed § § About 2 x better than PAGE compression More data fits in memory q Each column accessed independently § § Fetch only needed columns Can dramatically decrease I/O

18 Columnstore Index Architecture Segment C 1 C 2 C 3 C 4 C

18 Columnstore Index Architecture Segment C 1 C 2 C 3 C 4 C 5 C 6 q Row Group § 1 million logically contiguous rows q Column Segment § § Row Group § § § Segment contains values from one column for a set of rows Segments for the same set of rows comprise a row group Segments are compressed Each segment stored in a separate LOB Segment is unit of transfer between disk and memory

19 Columnstore Index Example Order. Date. Key Product. Key Store. Key Region. Key Quantity

19 Columnstore Index Example Order. Date. Key Product. Key Store. Key Region. Key Quantity Sales. Amount 20101107 106 01 1 6 30. 00 20101107 103 04 2 1 17. 00 20101107 109 04 2 2 20. 00 20101107 103 03 2 1 17. 00 20101107 106 05 3 4 20. 00 20101108 106 02 1 5 25. 00 20101108 102 02 1 1 14. 00 20101108 106 03 2 5 25. 00 20101108 109 01 1 1 10. 00 20101109 106 04 2 4 20. 00 20101109 106 04 2 5 25. 00 20101109 103 01 1 1 17. 00

20 Horizontally Partition (Row Groups) Order. Date. Key Product. Key Store. Key Region. Key

20 Horizontally Partition (Row Groups) Order. Date. Key Product. Key Store. Key Region. Key Quantity Sales. Amount 20101107 106 01 1 6 30. 00 20101107 103 04 2 1 17. 00 20101107 109 04 2 2 20. 00 20101107 103 03 2 1 17. 00 20101107 106 05 3 4 20. 00 20101108 106 02 1 5 25. 00 Order. Date. Key Product. Key Store. Key Region. Key Quantity Sales. Amount 20101108 102 02 1 1 14. 00 20101108 106 03 2 5 25. 00 20101108 109 01 1 1 10. 00 20101109 106 04 2 4 20. 00 20101109 106 04 2 5 25. 00 20101109 103 01 1 1 17. 00

21 Vertically Partition (Segments) Order. Date. Key Product. Key Store. Key 20101107 106 01

21 Vertically Partition (Segments) Order. Date. Key Product. Key Store. Key 20101107 106 01 20101107 103 04 20101107 109 04 20101107 103 03 20101107 106 05 20101108 106 02 Order. Date. Key Product. Key Store. Key 20101108 102 02 20101108 106 03 20101108 109 01 20101109 106 04 20101109 103 01 Region. Ke y Quantity 1 1 2 2 2 3 1 Region. Ke y 1 2 2 1 6 2 1 4 5 Quantity 1 5 1 4 5 1 Sales. Amoun t 30. 00 17. 00 20. 00 25. 00 Sales. Amoun t 14. 00 25. 00 10. 00 25. 00 17. 00

22 Compress Each Segment* Order. Date. Key Product. Key Store. Key Region. Key Quantity

22 Compress Each Segment* Order. Date. Key Product. Key Store. Key Region. Key Quantity Sales. Amount 20101107 106 01 1 6 30. 00 04 2 1 17. 00 20101108 103 109 03 2 05 4 5 02 20. 00 25. 00 Order. Date. Key Product. Key Store. Key Region. Key Quantity Sales. Amount 20101108 102 02 1 1 14. 00 03 2 5 25. 00 20101109 106 109 01 103 04 4 10. 00 25. 00 17. 00 Some segments will compress more than others *Encoding and reordering not shown

23 Fetch only needed columns SELECT Product. Key, SUM (Sales. Amount) FROM Sales. Table

23 Fetch only needed columns SELECT Product. Key, SUM (Sales. Amount) FROM Sales. Table WHERE Order. Date. Key < 20101108 GROUP BY Product. Key Store. Key Region. Key Quantity 01 1 6 04 2 1 3 2 03 Order. Date. Key Product. Key Sales. Amount 20101107 106 30. 00 103 17. 00 109 20. 00 20101108 25. 00 4 05 5 02 Store. Key Region. Key Quantity Order. Date. Key Product. Key 02 1 1 20101108 102 03 2 5 20101109 106 01 04 4 109 103 Sales. Amount 14. 00 25. 00 10. 00 25. 00 17. 00

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1.

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1. Overview 2. Architecture 3. Scenarios 4. More Info 24

Scenarios q Columnstore Index DDL Demo q Motricity Case Study q MSIT SONAR App:

Scenarios q Columnstore Index DDL Demo q Motricity Case Study q MSIT SONAR App: Aggregations (booyah!) q MSIT Problem Management App (double- booyah!) q Room for improvement… 25

Columnstore Index DDL: Demo 26

Columnstore Index DDL: Demo 26

SQL CAT Customer: Motricity q Sybase customer q Demographic data & app required columnstore

SQL CAT Customer: Motricity q Sybase customer q Demographic data & app required columnstore capabilities q Proved out technology on Denali bits q Numerous learnings discovered & incorporated into SQL Server 2012 RTM q Production implementation leveraged partitioning & “trickle loading” (& Always. On Availability Groups) q Case Study: http: //www. microsoft. com/casestudies/Microsoft-SQL-Server -2012 -Enterprise/Motricity/Mobile-Advertiser-Makes-Gainswith-Easy-Migration-of-Sybase-Database-to. Microsoft/710000000170 27

MSIT SONAR App Po. C Mechanism Columnstore Conventional Table & Indexes Logical Reads (8

MSIT SONAR App Po. C Mechanism Columnstore Conventional Table & Indexes Logical Reads (8 K pages) 160, 323 CPU (ms) 20, 360 Durn (ms) 9, 786 9, 053, 423 549, 608 193, 903 x 56 x 27 x 20 Δ 60 10 000 9 000 Columnstore 50 8 000 Conventional Table & Indexes 7 000 6 000 Δ. . . 40 30 5 000 4 000 20 3 000 2 000 10 1 000 - Logical Reads CPU (ms) Durn (ms) 28

MSIT Problem Management App q Interactive app for which queries returned results in unpredictable

MSIT Problem Management App q Interactive app for which queries returned results in unpredictable time frame: “ 2 – 7 minutes” q Wide table: 151 columns all commonly subject to interrogation § § DB Dev troubleshooting time & expertise scarce Indexing strategery difficult/impossible q Table stats 3. 1 M rows § 15 GB data (barely fits in memory; lots of churn) § 1. 7 M data pages q Columnstore index: 478 pages (memory resident) § 29

MSIT Problem Management q How to index a table with these columns? Ticket. Folder.

MSIT Problem Management q How to index a table with these columns? Ticket. Folder. ID, Module. Name, Module. Sub. Type, Ticket. ID, Created. Date, Created. By. Alias, Created. By. Full. Name, Created. By. Location. Country, Created. By. Location. City, Created. By. Location. Building, Source, Contact. Type, Contact. Alias, First. Name, Last. Name, Email, Business. Customer, Contact. Location. Country, Contact. Location. City, Contact. Location. Building, Contact. Location. Alt. Country, Contact. Location. Alt. City, Contact. Location. Alt. Building, Asset. Type, Asset. Service, Service. Category, Asset. Name, Asset. Model 1, Asset. Description, Asset. Manufacturer, Asset. Make 1, State, Status, Summary, Impact, Urgency, Priority, Contact. Reason, Details, Environment. Specific, Scheduled. Date, Current. L 1, Current. L 2, Current. L 3, Current. Group. ID, Current. Group, Current. Sub. Group. ID, Current. Sub. Group, Current. Team. ID, Current. Team, Current. Individual, Assigned. To. Name, SLADeadline, Resolved. At. Date, Response. Deadline, Responded. At. Date, Affected. Service, Requested. Service, Resolution. Category, Close. Details, Final. Status, Failure. Impact, Service. Outage, CIOutage, KBArticle. Action, KBArticle. Title, KBArticle. ID, Resolved. By. Alias, Resolved. By. Full. Name, Resolved. By. Location. Country, Resolved. By. Location. City, Resolved. By. Location. Building, Resolved. By. Employee. Type, Resolved. By. Company, Major. Incident. Review. Date, Major. Incident. Review. Outcome, Addl. Contacts, On. Behalf. Of, Original. Priority, Original. Service. Category, Original. Asset. Type, Original. Asset. Service, Original. Asset. Name, Original. Asset. Model, Original. Asset. Description, Original. Asset. Manufacturer, Original. Contact. Reason, Original. L 1, Original. L 2, Original. L 3, Original. Group. ID, Original. Group, Original. Subgroup. ID, Original. Subgroup, Original. Team. ID, Original. Team, Original. Individual, Alt. Telephone, Alt. Location, Location. Type, Ticket. Location, Reactivation. Reason, Closed. Date, Email. Sent. To. Queue. Date, First. Non. Auto. Response. Act. Type, First. Escalation. L 1 to. L 2 Flag, First. Escalation. L 1 to. L 2 Date, L 2 Escalation. Group. ID, L 2 Escalation. Group, L 2 Escalation. Subgroup. ID, L 2 Escalation. Subgroup, L 2 Escalation. Team. ID, L 2 Escalation. Team, L 2 First. Non. Auto. Response. Date, L 2 First. Non. Auto. Response. Activity, First. Escalation. L 2 to. L 3 Flag, First. Escalation. L 2 to. L 3 Date, L 3 Escalation. Group. ID, L 3 Escalation. Group, L 3 Escalation. Subgroup. ID, L 3 Escalation. Subgroup, L 3 Escalation. Team. ID, L 3 Escalation. Team, L 3 First. Non. Auto. Response. Date, L 3 First. Non. Auto. Response. Activity, Third. Party. Pend. Status. Duration, Third. Party. Reference, Last. Updated. Date, Pending. Closure, Third. Party. Name, Third. Party. Other, Sub. Group. Transfer. Flag, Sub. Group. Response. Deadline, Sub. Group. Response. Date, Primary. Config. Folder. ID, Service. Config. Folder. ID, KBArticle. Details, Cluster. Nodes, Resolvable. By. Previous. Team, Resolve. Comments, Client. Pending. Status. Duration. Min, Parts. Pending. Status. Duration. Min, Contact. Job. Title, KBArticle. Category, KBType, KBNotes, Met. Communication, Avg. Communication. Time 30

Typical Query --Current. Group, Current. Sub. Group, Module. Name SELECT TOP 10 Current. Group,

Typical Query --Current. Group, Current. Sub. Group, Module. Name SELECT TOP 10 Current. Group, Current. Sub. Group, KBArticle. Title, COUNT(*) as Row. Cnt FROM dbo. ITSM 01 ISRM WHERE Current. Group = 'Ex. D Services' AND Created. Date > dateadd(dd, -30, getdate()) AND Current. Sub. Group = 'SAP' AND Status = 'Resolved' GROUP BY Current. Group, Current. Sub. Group, Module. Name ORDER BY COUNT(*) DESC --OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX); 31

Query Duration used to be Unpredictable But Columnstore renders consistent subsecond results Duration 10:

Query Duration used to be Unpredictable But Columnstore renders consistent subsecond results Duration 10: 00 8: 37 9: 00 8: 00 7: 00 6: 00 5: 00 4: 00 3: 00 2: 33 2: 00 1: 00 0: 00 2: 27 1: 06 1: 37 0: 35 0: 00 Columnstore B-Tree Test #1 B-Tree Test #2 B-Tree Test #3 B-Tree Test #4 B-Tree Test #5 B-Tree Test #6 32

y tle Ti le tic Ar KB or eg at n am t. T

y tle Ti le tic Ar KB or eg at n am t. T e en lu tio n. C so Re rr Cu so ea t. R nt ac y rit io Pr n tio l Em ai ip cr es t. D Co se As ce ur So e am e. N ul od M Conventional vs. CS Perf: Duration 250 000 Duration (ms) 200 000 150 000 100 000 Conventional CS 50 000 0 33

ry tle Ti le tic Ar KB go te n am t. T e

ry tle Ti le tic Ar KB go te n am t. T e en n. C a lu tio so Re rr Cu y rit so ea t. R nt ac Co io Pr l n tio ip cr es t. D se As 1 000 Em ai ce ur So e e. N am ul od M Conventional vs. CS Perf: Logarithmic Duration (ms) 100 000 1 000 Conventional CS 100 10 34

ry tle Ti le tic Ar KB go te n am t. T e

ry tle Ti le tic Ar KB go te n am t. T e en n. C a lu tio so Re rr Cu y rit so ea t. R nt ac Co io Pr l n tio ip cr es t. D se As 1 000 Em ai ce ur So e e. N am ul od M Conventional vs. CS Perf: I/O (8 KB data pages) 100 000 10 000 Conventional 1 000 CS 100 10 35

or y tle Ti le tic Ar KB eg at am n so t.

or y tle Ti le tic Ar KB eg at am n so t. T e en lu tio n. C so Re rr Cu y rit io ea t. R nt ac Co Pr n tio l Em ai 100 000 ip cr es t. D se As ce ur So e am e. N ul od M Conventional vs. CS Perf: CPU (ms) 10 000 1 000 CPU (ms) Conventional 100 CPU (ms) CS 10 36

Columnstore Indexes: Room for Improvement… 37

Columnstore Indexes: Room for Improvement… 37

Key Learnings q Read-only—for now (see references for Trickle Loading implementation) q Query performance

Key Learnings q Read-only—for now (see references for Trickle Loading implementation) q Query performance of columnstore indexes is good, even great—once you have the “right” plan q Able to meet/beat Sybase. IQ in terms of performance however in some cases this took significant query rewrite and schema changes q Some specific considerations: § Filtering optimizations in the storage engine are limited to numeric data types § Using OR statements in predicates results in less efficient plan vs. what is generated for row based indexes § Some OUTER JOINs § Large number of joins results in inability to use batch mode processing § Joining on string data types does not push bitmap filtering down into the storage engine

Key Learnings q To get full parallelism of column store index builds may require

Key Learnings q To get full parallelism of column store index builds may require significant memory grants. This can be problematic on wide or string heavy tables. § q Parallelism on index build is not support if table has < 1 million rows. Always. On availability groups work well for many DW workloads § Able to keep up with expected bulk load rates and columnstore index builds § Observed some upper limits on the “bytes per second” to the replica in the 40 MB/s*. This caused the replica to fall behind for regular index build scenarios and more intensive loads. *Still under discussion with development team

Best Practices q Create CS indexes on “large” fact tables q Leverage “star joins”

Best Practices q Create CS indexes on “large” fact tables q Leverage “star joins” § Joins on integer keys q Inner Join, Group By, Aggregations q Include all columns of eligible datatypes q Leverage Parallelism q Provide sufficient memory q Use in conjunction with partitioned tables 40

41 Loading New Data q Table with columnstore index can be read, but not

41 Loading New Data q Table with columnstore index can be read, but not updated Partition switching is allowed § INSERT, UPDATE, DELETE, and MERGE not allowed § q Methods for loading data § Disable, update, rebuild § Partition switching § UNION ALL between large table with columnstore index and smaller updateable table

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1.

Agenda 1. Introduction 2. SQL Server 2012: New, Improved Features! 3. Columnstore Indexes 1. Overview 2. Architecture 3. Scenarios 4. More Info 42

Columnstore Indexes: More Info The following is a must-read: SQL Server Columnstore Index FAQ

Columnstore Indexes: More Info The following is a must-read: SQL Server Columnstore Index FAQ Eric Hanson, Susan Price, etc. http: //social. technet. microsoft. com/wiki/contents/articles/3540. sql-server-columnstoreindex-faq-en-us. aspx Forthcoming: My blog http: //blogs. msdn. com/jimmymay Other references: Columnstore Indexes http: //msdn. microsoft. com/en-us/library/gg 492088(SQL. 110). aspx SQL Server Columnstore Performance Tuning http: //social. technet. microsoft. com/wiki/contents/articles/4995. sql-server-columnstore-performance-tuning. aspx Trickle Loading with Columnstore Indexes http: //social. technet. microsoft. com/wiki/contents/articles/trickle-loading-with-columnstore-indexes. aspx How do Column Stores Work? Thomas Kejser http: //blog. kejser. org/2012/07/04/how-do-column-stores-work This is Columnstore – Part 1 Gavin Payne, MCM http: //gavinpayneuk. com/2012/07/22/this-is-columnstore-part-1 Saturday Keynote: Inside SQL Server 2012’s Columnstore Index http: //sqlbits. com/Sessions/Event 10/Saturday_Keynote 43

Q-&-A 44

Q-&-A 44

Acknowledgements q Susan Price, SQL Server PM (former) q Eric Hansen, SQL Server PM

Acknowledgements q Susan Price, SQL Server PM (former) q Eric Hansen, SQL Server PM q Gus Apostol, SQL Server PM q Thomas Kejser, Fusion-io CTO (former SQL CAT PM) q Mike Ruthruff, Engineer, Bungie Studios (former SQL CAT PM) q Gavin Payne, MCM, Coeo Consultant q Steve Fisher, MSIT Senior Service Engineer 45