CStore A Columnoriented DBMS By New England Database

C-Store: A Column-oriented DBMS By New England Database Group 1

Current DBMS Gold Standard u. Store fields in one record contiguously on disk u. Use B-tree indexing u. Use small (e. g. 4 K) disk blocks u. Align fields on byte or word boundaries u. Conventional (row-oriented) query optimizer and executor (technology from 1979) u. Aries-style transactions M. I. T 2

Terminology -- “Row Store” Record 1 Record 2 Record 3 Record 4 E. g. DB 2, Oracle, Sybase, SQLServer, … M. I. T 3

Row Stores are Write Optimized u. Can insert and delete a record in one physical write u. Good u. But for on-line transaction processing (OLTP) not for read mostly applications u Data warehouses u CRM M. I. T 4

Elephants Have Extended Row Stores u. With Bitmap indices u. Better sequential read u. Integration of “datacube” products u. Materialized views But there may be a better idea……. M. I. T 5

Column Stores M. I. T 6

At 100 K Feet…. u. Ad-hoc queries read 2 columns out of 20 u. In a very large warehouse, Fact table is rarely clustered correctly u. Column store reads 10% of what a row store reads M. I. T 7

C-Store (Column Store) Project u. Brandeis/Brown/MIT/UMass-Boston project u Usual suspects participating u Enough coded to get performance numbers for some queries u Complete status later M. I. T 8

We Build on Previous Pioneering Work…. u. Sybase IQ (early ’ 90 s) u. Monet (see CIDR ’ 05 for the most recent description) M. I. T 9

C-Store Technical Ideas u. Code u. No the columns to save space u. Big alignment disk blocks u. Only materialized views (perhaps many) u. Focus on Sorting not indexing u. Automatic physical DBMS design M. I. T 10

C-store (Column Store) Technical Ideas u. Optimize for grid computing u. Innovative u. Xacts u. Data redundancy – but no need for Mohan ordered on anything, Not just time u. Column optimizer and executor M. I. T 11

How to Evaluate This Paper…. u. None of the ideas in isolation merit publication u. Judge the complete system by its (hopefully intelligent) choice of u Small collection of inter-related powerful ideas u That together put performance in a new sandbox M. I. T 12

Code the Columns u. Work hard to shrink space u Use extra space for multiple orders u. Fundamentally u E. g. easier than in a row store RLE works well M. I. T 13

No Alignment u. Densepack u E. g. columns a 5 bit field takes 5 bits u. Current CPU speed going up faster than disk bandwidth u Faster to shift data in CPU than to waste disk bandwidth M. I. T 14

Big Disk Blocks u. Tunable u. Big (minimum size is 64 K) M. I. T 15

Only Materialized Views u. Projection (materialized view) is some number of columns from a fact table u. Plus columns in a dimension table – with a 1 -n join between Fact and Dimension table u. Stored in order of a storage key(s) u. Several may be stored!!!!! u. With a permutation, if necessary, to map between them M. I. T 16

Only Materialized Views u. Table (as the user specified it and sees it) is not stored! u. No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one) M. I. T 17

Example User view: EMP (name, age, salary, dept) Dept (dname, floor) Possible set of MVs: MV-1 (name, dept, floor) in floor order MV-2 (salary, age) in age order MV-3 (dname, salary, name) in salary order M. I. T 18

Different Indexing Sequential Few values Many values RLE encoded Conventional B-tree at the value level Delta encoded Conventional B-tree at the block level Non sequential Bitmap per value Conventional Gzip Conventional B-tree at the block level M. I. T 19

Automatic Physical DBMS Design u. Not enough 4 -star wizards to go around u. Accept a “training set” of queries and a space budget u. Choose the MVs auto-magically u. Re-optimize periodically based on a log of the interactions M. I. T 20

Optimize for Grid Computing u. I. e. shared-nothing u Dewitt (Gamma) was right u. Horizontal partitioning and intra-query parallelism as in Gamma M. I. T 21

Innovative Redundancy u. Hardly any warehouse is recovered by a redo from the log u Takes too long! u. Store enough MVs at enough places to ensure K-safety u. Rebuild dead objects from elsewhere in the network u. K-safety is a DBMS-design problem! M. I. T 22

XACTS – No Mohan u. Undo from a log (that does not need to be persistent) u. Redo by rebuild from elsewhere in the network M. I. T 23

XACTS – No Mohan u. Snapshot isolation (run queries as of a tunable time in the recent past) u To solve read-write conflicts u. Distributed Xacts u Without a prepare message (no 2 phase commit) M. I. T 24

Storage (sort) Key(s) is not Necessarily Time u. That would be too limiting u. So how to do fast updates to densepack column storage that is not in entry sequence? M. I. T 25

Solution – a Hybrid Store Write-optimized Column store Tuple mover Read-optimized Column store (Much like Monet) (Batch rebuilder) (What we have been talking about so far) M. I. T 26

Column Executor u. Column operations – not row operations u. Columns u. Late remain coded – if possible materialization of columns M. I. T 27

Column Optimizer u. Chooses u Most u. Build MVs on which to run the query important task in snowflake schemas u Which are simple to optimize without exhaustive search u. Looking at extensions M. I. T 28

Current Performance u 100 X popular row store in 40% of the space u 10 X popular column store in 70% of the space u 7 X popular row store in 1/6 th of the space u. Code available with BSD license M. I. T 29

Structure Going Forward u. Vertica u Very well financed start-up to commercialize C-store u Doing the heavy lifting u. University Research u Funded by Vertica M. I. T 30

Vertica u. Complete alpha system in December ‘ 05 u Everything, u With including DBMS designer current performance! u Looking for early customers to work with (see me if you are interested) M. I. T 31

University Research u. Extension of algorithms to non-snowflake schemas u. Study of L 2 cache performance u. Study of coding strategies u. Study of executor options u. Study of recovery tactics u. Non-cursor u. Study interface of optimizer primitives M. I. T 32