Positional Update Handling in Column Stores Sndor Hman

Positional Update Handling in Column Stores Sándor Héman Marcin Zukowski Niels Nes Lefteris Sidirourgos

UPDATE IN PLACE: A Poison Apple? Jim Gray, 1981 “. . for performance reasons,

Problem: Column Store Updates • I/O proportional to number of attributes – I/O blocks

Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure – Merge with

Naïve Approach: Delta Tables • For each table, maintain two update friendly row-store tables:

Naïve Approach: Delta Tables • Rewrite table scans: Merge. Union[store, prod](Scan(INS), Merge. Diff[store, prod](

The Idea: Positional Updates • Remember the position of an update rather than its

SID/RID Example SID STORE PROD NEW QTY RID 0 London chair N 30 0

SIDs and RIDs • RID(t) • ∆(t) = SID(t) + ∆(t) = #inserts before

PDT Example SID STORE PROD NEW QTY RID 0 London chair N 30 0

PDT Example SID STORE PROD NEW QTY RID 0 Berlin chair Y 5 0

PDT Example INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4) INSERT INTO inventory VALUES

Stacking PDTs • Arbitrary number of layers: “deltas on. . ” – RID domain

Stacking for Isolation • ‘lock’ PDT down for further updates – Immutable read-PDT BIG:

Snapshot Isolation Transaction State Propagate() Trans PDT Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions Trans A Trans PDT Trans B Trans

Optimistic Concurrency Control • Two concurrent transactions • A commits before B Trans A

Optimistic Concurrency Control Trans A Trans PDT Trans B Trans ) PDT e( t

Optimistic Concurrency Control Trans PDT Trans A Trans PDT Consecutive! Trans B Trans PDT

Optimistic Concurrency Control Trans B Trans PDT Serialize() Pr o pa ga te ()

Concluding. . • PDTs speed-up differential update merging – Reduced I/O volume – Reduced

Slides: 29

Download presentation

Positional Update Handling in Column Stores Sándor Héman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz

UPDATE IN PLACE: A Poison Apple? Jim Gray, 1981 “. . for performance reasons, most disc-based systems have been seduced into updating the data in place. ” 30 years of hardware improvements in sequential/throughput beating random/latency…. in-place less feasible every year. alternative: differential approach. In column stores, in-place updating is by now clearly infeasible

Problem: Column Store Updates • I/O proportional to number of attributes – I/O blocks large and compressed – Sometimes even replicated – Read-Optimized Update-Unfriendly • Table often kept ordered on sort-key (SK) attributes – Uniform update load scattered write access

Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure – Merge with base table during scan

Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure – Merge with base table during scan • Challenges: – Efficiently maintainable data-structure – Minimize Merge impact for read-only queries

Naïve Approach: Delta Tables • For each table, maintain two update friendly row-store tables: – INS(C 1. . Cn) – DEL(SK 1. . SKm) – MOD = DEL + INS Base table: inventory Sort-Key (SK): [store, prod] store prod new qty Berlin chair Y 5 Berlin cloth Y 20 qty London stool N 10 London table N 20 Paris rug N 1 Paris stool N 5 Inserts table: INS store new Deletes table: DEL store Paris prod rug

Naïve Approach: Delta Tables • Rewrite table scans: Merge. Union[store, prod](Scan(INS), Merge. Diff[store, prod]( Scan(Inventory), Scan(DEL))) Base table: inventory Sort-Key (SK): [store, prod] store prod new qty Berlin chair Y 5 Berlin cloth Y 20 qty London stool N 10 London table N 20 Paris rug N 1 Paris stool N 5 Inserts table: INS store new Deletes table: DEL store Paris prod rug

Naïve Approach: Delta Tables • Rewrite table scans: Merge. Union[store, prod](Scan(INS), Merge. Diff[store, prod]( Scan(Inventory), Scan(DEL))) for up-to-date image • Expensive! Actual table: inventory Sort-Key (SK): [store, prod] store prod new qty Berlin chair Y 5 Berlin cloth Y 20 London stool N 10 – I/O to scan SK ‘merge’ London table N columns; also if query Paris stool N does not need SK cols – Each query pays CPU effort to locate the same change positions over and over again 20 5

The Idea: Positional Updates • Remember the position of an update rather than its SK values – Merge once at write Read-Optimized approach – No need to scan SK columns – Scan skip less CPU overhead Notation: • TABLEx state of TABLE at time x • SID(t): Stable. ID – Position of tuple t in immutable base TABLE 0 Stable • RIDx(t): Row. ID – Position of visible tuple t at time x VOLATILE! – SID(t) = RID 0(t)

SID/RID Example SID STORE PROD NEW QTY RID 0 London chair N 30 0 1 London stool N 10 1 2 London table N 20 2 3 Paris rug N 1 3 4 Paris stool N 5 4 INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10) INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE 0 SID STORE PROD NEW QTY 0 Berlin chair Y 5 0 0 Berlin cloth Y 20 1 0 Berlin table Y 10 2 0 London chair N 30 3 1 London stool N 10 4 2 London table N 20 5 3 Paris rug N 1 6 4 Paris stool N 5 7 TABLE 1 RID

SIDs and RIDs • RID(t) • ∆(t) = SID(t) + ∆(t) = #inserts before t – #deletes before t = RID(t) – SID(t) • SID and RID are monotonically increasing – organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆) • Positional Delta Tree (PDT) – SIDs are stable – Only need to maintain cumulative ∆ on path root leaf

PDT Example SID STORE PROD NEW QTY RID 0 London chair N 30 0 1 London stool N 10 1 2 London table N 20 2 3 Paris rug N 1 3 4 Paris stool N 5 4 SID 0 INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10) INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE 0 ∆ 2 Insert Value Table 1 STORE SID 0 0 SID 0 ins i 0 type ins type value i 2 i 1 value PROD NEW QTY i 0 Berlin table Y 10 i 1 Berlin cloth Y 20 i 2 Berlin chair Y 5

PDT Example SID STORE PROD NEW QTY RID 0 Berlin chair Y 5 0 0 Berlin cloth Y 20 1 0 Berlin table Y 10 2 0 London chair N 30 3 1 London stool N 10 4 2 London table N 20 5 3 Paris rug N 1 6 4 Paris stool N 5 7 SID 0 DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’ DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’ TABLE 1 ∆ 2 Insert Value Table -1 STORE SID 0 0 SID 3 del d 0 type ins type value i 2 i 1 value PROD NEW QTY i 0 Berlin table Y 10 i 1 Berlin cloth Y 20 i 2 Berlin chair Y 5

PDT Example SID STORE PROD NEW QTY RID 0 Berlin chair Y 5 0 0 Berlin cloth Y 20 1 0 London chair N 30 2 1 London stool N 10 3 2 London table N 20 4 4 Paris stool N 5 5 SID 0 INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4) Insert at RID = 5 TABLE 2 ∆ 2 Insert Value Table -1 STORE RID 5 > 0 + 2 SID 0 0 SID 3 3 ins del di 00 d 0 type ins type value i 2 i 1 value PROD NEW QTY i 0 Berlin Paris table rack Y 20 4 i 1 Berlin cloth Y 5 20 i 2 Berlin chair Y 10 5

PDT Example INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4) INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4) SID ∆ 1 3 Separator SIDs Subtree ∆ 1 ∆ 3 Running ∆ RID 4 Separator RIDs SID 0 2 1 ∆ 3 1 0 ∆ 2 ∆ 4 RID 2 RID 7 SID 0 0 SID 1 SID 3 3 type ins type ins del value i 2 i 1 value i 4 value i 3 value i 0 d 0 ∆ 0 1 ∆ 2 ∆ 3 ∆ 4 5 RID 0 1 RID 2 RID 4 RID 7 8

Stacking PDTs • Arbitrary number of layers: “deltas on. . ” – RID domain of child PDT = SID domain of parent PDT generalization: lo • PDThi contains all differences in time [lo, hi] PDT t 2 t 3 t 0 vs PDT t 1 are consecutive t 2=t 1 PDT PDT t 2 PDT t 3 t 1 PDT PDT t 2 PDT PDT t 1 Table t 0

Stacking PDTs • Arbitrary number of layers: “deltas on. . ” – RID domain of child PDT = SID domain of parent PDT generalization: lo • PDThi contains all differences in time [lo, hi] PDT t 2 t 3 t 0 vs PDT t 1 are consecutive aligned t 2=t 1 PDT t 2=t 0 “same base” overlapping [t 2, t 3] overlaps [t 0, t 1] PDT “uncomparable” / “incompatible” Table

Stacking for Isolation • ‘lock’ PDT down for further updates – Immutable read-PDT BIG: main memory resident • ‘stack’ empty PDT on top – Updateable write-PDT SMALL: L 2 cache resident – Note: PDTs are consecutive • once in a while changes are propagated – Propagate() operation • Requires consecutive PDTs TABLEx Write-PDT Propagate() Read-PDT Stable Table

Snapshot Isolation Transaction State Propagate() Trans PDT Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table • Transaction creates snapshot copy of write -PDT • Updates go into trans. PDT • On commit, Propagate() trans-PDT into write-PDT

Optimistic Concurrency Control • Two concurrent transactions Trans A Trans PDT Trans B Trans PDT Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions • A commits before B Trans A Trans PDT Pr op Copy ag Write-PDT ate Trans B Trans PDT () TABLEx Write-PDT Read-PDT Stable Table Copy Write-PDT

Optimistic Concurrency Control Trans A Trans PDT Trans B Trans ) PDT e( t a ag Copy p o Pr Write-PDT TABLEx Write-PDT Read-PDT Stable Table • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! – A changed RID enumeration

Optimistic Concurrency Control Trans PDT Trans A Trans PDT Consecutive! Trans B Trans PDT Serialize() • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! – A changed RID enumeration • Serialize(A, B) TABLEx Write-PDT Read-PDT Stable Table – Makes aligned PDTs consecutive – MAY FAIL!! trans abort = succeeds if no conflict = write set intersection

Optimistic Concurrency Control Trans B Trans PDT Serialize() Pr o pa ga te () Trans A Trans PDT TABLEx Write-PDT Read-PDT Stable Table • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! – A changed RID enumeration • Serialize(A, B) – Makes aligned PDTs consecutive – MAY FAIL!! trans abort = succeeds if no conflict = write set intersection Extend to any number of concurrent transactions by serializing against all PDTs of transactions that committed during its lifetime (a. k. a. backward looking OCC)

Concluding. . • PDTs speed-up differential update merging – Reduced I/O volume – Reduced CPU merge overhead • Tree structure – logarithmic lookup & maintenance of volatile RIDs – main operations: Merge(), Propagate(), Serialize() • PDTs are stackable, and capture Write-Set – Great structure for Snapshot Isolation • Formal definitions, algorithms and benchmarks in paper

Thank you!

Microbenchmarks

TPCH-30