Columnstore Indexes in SQL Server 2012 Conor Cunningham

What This Talk Covers • SQL Server’s upcoming “Denali” release contains a new feature

Who am I? • I’ve worked at Microsoft for the SQL Core Engine team

Agenda • • Data Warehouse Introduction New Feature and Demo How the Feature Works

Data Warehouse Introduction • Data Warehouses support reporting and business intelligence operations in organizations

6 Data Warehouse Challenges • These kinds of databases become difficult once they get

Opportunity • What If… – We made DW queries 10+ times faster? • Example

9 How Does It Work? • New Index Type – Column. Store • New

Index Storage Design • Column-Orientation – Store data vertically instead of per-row – String

11 Speedup from the Index • If the IO required is cut in half…

12 What takes time in a CPU? • Memory IO takes time – Cache

13 Query Execution Row Mode Changes • Each operator calls child for each row

Batch Format • Column-Oriented • Sized to fit within L 2 cache • Multiple

15 SQL 2012 Restrictions • Create index: – Only on common business data types

16 Using Apollo: Loading new data • Table with columnstore index can be read,

17 Query performance issues • Not all operators are batch-mode enabled – Scan, Filter,

18 Revisiting Our Example Scenario • For SQL Server 2012, our customer will be

19 Summary • New Index and Execution Algorithms for DW • Significant speedup for

Openness and Interoperability @Microsoft and Open Source gateway for deeper exploration of open source

Please rate this lecture and WIN HTC MOZART! Help us choose the best Sinergija

Slides: 22

Download presentation

Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server conorc@microsoft. com Representing Microsoft Development Center Serbia

What This Talk Covers • SQL Server’s upcoming “Denali” release contains a new feature for Data Warehouses to speed up Data Warehouse queries • This talk provides an overview of the new surface area and some details about how it works

Who am I? • I’ve worked at Microsoft for the SQL Core Engine team as an Architect for many years • I work mostly on Query Processors • I wrote the SQL 2008 Internals book on how the Query Optimizer works • I blog at “Conor vs. SQL” • I like to talk to customers about how they use the product so that I improve things in future releases

Agenda • • Data Warehouse Introduction New Feature and Demo How the Feature Works Restrictions in this release

Data Warehouse Introduction • Data Warehouses support reporting and business intelligence operations in organizations • Store facts that can be aggregated over different dimensions • They often store lots and lots of facts (rows) – This leads to a design pattern called a star schema where fact tables are “over”-normalized to reduce row width – Dimension tables are frequently joined – Results are very often aggregated • Example: Show me the sales totals for each department by month for the past 3 years

6 Data Warehouse Challenges • These kinds of databases become difficult once they get big. – – – Query latency ETL load times Backup time and size Index rebuilding Finding time to load new data Query plan selection issues/limitations

Opportunity • What If… – We made DW queries 10+ times faster? • Example – Business Analyst does ROLAP reports against SQL Server 2008 – – Click to drill down into a report Go get some coffee Click again Go get more coffee • We aim to make that experience interactive – (However, coffee shop profits may plunge!)

8 Demo

9 How Does It Work? • New Index Type – Column. Store • New Query Execution Algorithms – “Batch” mode • Specifically Target Star Join Queries ▫ ▫ Not all queries are faster in the initial release Customers will want to consider this in their application design Supported Pattern: SELECT SUM(…), cols FROM Fact. Tbl JOIN Dim. Tbl 1 JOIN Dim. Tbl 2 … WHERE … GROUP BY cols

Index Storage Design • Column-Orientation – Store data vertically instead of per-row – String Dictionaries for variable-length data • • Segment data into groups (1 million rows/group) Benefits – – DW queries usually pick only a subset of columns You can do the IO only for those columns We can also compress that data effectively since it often has lots of duplicates Space savings of 1. 5 x-2 x vs. a row-based page-compressed equivalent IO Patterns for (CI Scan, Column-based scan of 3 cols, Column-based w/Compression)

11 Speedup from the Index • If the IO required is cut in half… – We don’t get to 10 times faster (yet) – We need to improve the memory utilization and CPU utilization to get the rest of the speedup • So how do we improve Query Execution 10 x? ? ?

12 What takes time in a CPU? • Memory IO takes time – Cache Misses stall the CPU – L 2 cache misses stall the CPU even more – So we reduced cache misses • Instructions take time – Instructions also go through the caches – So we reduced instructions • Disk Access takes time – So we biased the memory policies for this index to work best when in memory • Over time, CPU speed has increased faster than memory speed, making all of these worse

13 Query Execution Row Mode Changes • Each operator calls child for each row • This works fine for smaller numbers of rows, poorly for batches • In bigger queries, CPU cycles instructions in and out of the CPU (L 2 cache misses) • So this model suffers in DW with too many instructions, too many cache misses

Batch Format • Column-Oriented • Sized to fit within L 2 cache • Multiple Operators work on a batch sequentially • Goal: Reduce avg. per-tuple cost – Compression – Reducing L 2 data and instruction cache misses – Probabilistic data representations – Probabilistic operator execution algorithms • This gets us to 10 x faster (avg)

15 SQL 2012 Restrictions • Create index: – Only on common business data types Yes int, real, string, money, datetime, decimal <= 18 digits No decimal > 18 digits, binary, BLOB, CLR, (n)varchar(max), uniqueidentifier, datetimeoffset with precision > 2 • Maintain table: limited operations – Can read but not update the data – However: One can switch partitions in and out • Process queries: all read-only T-SQL queries run – Some queries are accelerated more than others

16 Using Apollo: Loading new data • Table with columnstore index can be read, not updated – Partition switching is allowed – INSERT, UPDATE, DELETE, and MERGE not allowed • Three possible methods for loading data – Disable, update, rebuild – Partition switching – UNION ALL between large table with columnstore and smaller updateable table

17 Query performance issues • Not all operators are batch-mode enabled – Scan, Filter, Project – Local hash partial aggregation – Hash inner join, hash table build • Only parallel queries can use batch mode • If hash tables don’t fit into memory, fall back to row-mode processing – Memory grant request depends on cardinality est. – Falling back to row-mode is slow

18 Revisiting Our Example Scenario • For SQL Server 2012, our customer will be able to: – Have specific queries go very fast (with less coffee) – DW Application developers • Must design their code to load/unload data online • Can use hints to control user experience for the fast and slow cases – Hint index – if it fails to get a plan, then you can present UI to the user to “maybe go get coffee” and then run in row mode • This story will continue to improve as we add more capabilities to Batch processing

19 Summary • New Index and Execution Algorithms for DW • Significant speedup for conforming applications • Opportunities for customers who can build their code to leverage the benefits

20 Thank You! • Questions?

Openness and Interoperability @Microsoft and Open Source gateway for deeper exploration of open source engagements Port 25 Codeplex blogs from the platform community and the OSS Lab teams resources for developers and consumers of open source projects http: //Port 25. technet. com http: //www. codeplex. com http: //www. microsoft. com/ openness Interoperability Bridges technical collaborative works http: //www. interoperability bridges. com Open Up Shared Source cross-Industry Interoperability and Standards activities portal for programmatically sharing code http: //www. microsoft. com/i nterop/openup http: //www. microsoft. com/ sharedsource OData Open Spec Biz. Spark open source starter kit for Internet publishing of Government datasets using the Open Data protocols, file formats, standards, technical specifications Program for Start-Up companies from both commercial and open source backgrounds http: //ogdisdk. cloudapp. net http: //www. microsoft. com/ openspecifications http: //www. microsoft. com/ bizspark How can I receive up-to-date Openness announcements from Microsoft? In addition to the websites above, you can receive regular updates to Microsoft’s openness, interoperability and standards efforts via the following channels: • http: //blogs. technet. com/b/openness/ • http: //blogs. msdn. com/b/interoperability/ • http: //twitter. com/Open. At. Microsoft • http: //port 25. technet. com • http: //channel 9. msdn. com/Blogs/Interoperability

Please rate this lecture and WIN HTC MOZART! Help us choose the best Sinergija lecturer! Telekom Srbija and Microsoft will award you – at the conference end, we’ll give one HTC Mozart WP 7 phone to someone from the audience – randomly. Go to www. mssinergija. net, log in and cast your votes. You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Please use computers at the front of this room, or rate lecture from your phone or home computer, at Sinergija portal. This prize contest will end at Thursday, October 20 th at 9 PM. Winner will be announced at the official Sinergija web portal, www. mssinergija. net is a friend of Sinergija 2011 Conference and Imagine Cup student competition in Serbia.