Critical Sections Reemerging Concerns for DBMS Ryan Johnson

  • Slides: 24
Download presentation
Critical Sections: Re-emerging Concerns for DBMS Ryan Johnson Ippokratis Pandis Anastasia Ailamaki Carnegie Mellon

Critical Sections: Re-emerging Concerns for DBMS Ryan Johnson Ippokratis Pandis Anastasia Ailamaki Carnegie Mellon University École Polytechnique Féderale de Lausanne

A Convergence of Trends êOLTP fits in memory[stonebraker 07] – 256 GB RAM =

A Convergence of Trends êOLTP fits in memory[stonebraker 07] – 256 GB RAM = 80 M TPC-C customers – Amazon. com had 30 M customers in 2005 êMulti-core computing – Dozens of HW contexts per chip today – Expect 2 x more cores each generation Scalability concerns replace I/O bottleneck © 2008 Ryan Johnson

Potential Scalability Bottlenecks Applications DBMS Hardware ü ? ? ü © 2008 Ryan Johnson

Potential Scalability Bottlenecks Applications DBMS Hardware ü ? ? ü © 2008 Ryan Johnson

DBMS Scalability Comparison Lower is better Sun T 2000 32 HW threads Insert-only microbenchmark

DBMS Scalability Comparison Lower is better Sun T 2000 32 HW threads Insert-only microbenchmark Current engines face internal scalability challenges © 2008 Ryan Johnson

Contributions êEvaluate common synchronization approaches, identify most useful ones êHighlight performance impact of tuning

Contributions êEvaluate common synchronization approaches, identify most useful ones êHighlight performance impact of tuning database engines for scalability © 2008 Ryan Johnson

Outline êIntroduction êScalability and Critical Sections êSynchronization Techniques êShore-MT êConclusion © 2008 Ryan Johnson

Outline êIntroduction êScalability and Critical Sections êSynchronization Techniques êShore-MT êConclusion © 2008 Ryan Johnson

Sources of Serialization êConcurrency control – Serialize conflicting logical transactions – Enforce consistency and

Sources of Serialization êConcurrency control – Serialize conflicting logical transactions – Enforce consistency and isolation êLatching – Serialize physical page updates – Protect page integrity êCritical sections (semaphores) – Serialize internal data structure updates – Protect data structure integrity Critical sections largely overlooked © 2008 Ryan Johnson

Critical Sections in DBMS êFine-grained parallelism vital to scalability êTransactions require tight synchronization –

Critical Sections in DBMS êFine-grained parallelism vital to scalability êTransactions require tight synchronization – Many short critical sections – Most uncontended, some heavily contended êTPC-C Payment – Accesses 4 -6 records – Enters ~100 critical sections in Shore Cannot ignore critical section performance © 2008 Ryan Johnson

Example: Index Probe Time Locks Latches Critical Sections Many short critical sections per operation

Example: Index Probe Time Locks Latches Critical Sections Many short critical sections per operation © 2008 Ryan Johnson

Outline êIntroduction êScalability and Critical Sections êSynchronization Techniques êShore-MT êConclusion © 2008 Ryan Johnson

Outline êIntroduction êScalability and Critical Sections êSynchronization Techniques êShore-MT êConclusion © 2008 Ryan Johnson

Related Work êConcurrency control – Locking [got 92] – BTrees [moh 90] – Often

Related Work êConcurrency control – Locking [got 92] – BTrees [moh 90] – Often assumes single hardware context êLatching – “Solved problem” [agr 87] – Table-based latching [got 92] êSynchronization Techniques © 2008 Ryan Johnson

Lock-based Synchronization Blocking Mutex üSimple to use û Overhead, unscalable Test and set Spinlock

Lock-based Synchronization Blocking Mutex üSimple to use û Overhead, unscalable Test and set Spinlock üEfficient û Unscalable Queue-based spinlock üScalable û Mem. management Reader-writer lock üConcurrent readers û Overhead © 2008 Ryan Johnson

Lock-free Synchronization Optimistic Concurrency Control (OCC) üNo read overhead û Writes cause livelock Atomic

Lock-free Synchronization Optimistic Concurrency Control (OCC) üNo read overhead û Writes cause livelock Atomic Updates üEfficient û Limited applicability Lock-free Algorithms üScalable û Special-purpose algs Hardware Approaches (e. g. transactional memory) üEfficient, scalable û Not widely available © 2008 Ryan Johnson

Experimental Setup êHardware – Sun T 2000 “Niagara” server – 8 cores, 4 threads

Experimental Setup êHardware – Sun T 2000 “Niagara” server – 8 cores, 4 threads each (32 total) êMicrobenchmark: while(!timout_flag) delay_ns(t_out); acquire(); delay_ns(t_in); release(); © 2008 Ryan Johnson

Critical Section Overhead t_out = t_in 16 threads DBMS Critical Sections Critical sections are

Critical Section Overhead t_out = t_in 16 threads DBMS Critical Sections Critical sections are 60 -90% overhead © 2008 Ryan Johnson

Scalability Under Contention t_out = t_in = 0 ns TATAS or MCS best depending

Scalability Under Contention t_out = t_in = 0 ns TATAS or MCS best depending on contention © 2008 Ryan Johnson

Reader-writer Performance DBMS spans gamut t_out = t_in = 100 ns 16 threads Reader-writer

Reader-writer Performance DBMS spans gamut t_out = t_in = 100 ns 16 threads Reader-writer locks too expensive to be useful © 2008 Ryan Johnson

Selecting a Primitive Uncontended Long TAS Mutex Read-mostly OCC MCS OCC Short Contended Lock-free

Selecting a Primitive Uncontended Long TAS Mutex Read-mostly OCC MCS OCC Short Contended Lock-free A handful of primitives covers most cases © 2008 Ryan Johnson

Outline êIntroduction êScalability and Critical Sections êSynchronization Techniques êShore-MT êConclusion © 2008 Ryan Johnson

Outline êIntroduction êScalability and Critical Sections êSynchronization Techniques êShore-MT êConclusion © 2008 Ryan Johnson

Alleviating Contention êModify algorithms – Shrink/distribute/eliminate critical sections – Fundamental scalability improvements êTune existing

Alleviating Contention êModify algorithms – Shrink/distribute/eliminate critical sections – Fundamental scalability improvements êTune existing critical sections – Reduce overheads – Straightforward and localized changes Both approaches vital for scalability © 2008 Ryan Johnson

From Shore to Shore-MT Throughput (tps) 300 baseline dist-bpool tune-locks malloc log 2 lockm

From Shore to Shore-MT Throughput (tps) 300 baseline dist-bpool tune-locks malloc log 2 lockm bpool 2 final ideal 100 30 10 3 1 0 8 16 24 32 Concurrent Threads Tuning and algorithmic changes at each step © 2008 Ryan Johnson

Conclusions êOnly a few types of primitives useful êAlgorithms and tuning both essential to

Conclusions êOnly a few types of primitives useful êAlgorithms and tuning both essential to performance/scalability êOpen issues – Developing ever-finer grained algorithms – Reduce synchronization overhead – Improve usability of reader-writer locks – Efficient lock-free algorithms Plenty of room for improvements by future research © 2008 Ryan Johnson

Bibliography ê[agr 87] R. Agrawal, M. Carey, and M. Livoy. “Concurrency Control Performance Modeling:

Bibliography ê[agr 87] R. Agrawal, M. Carey, and M. Livoy. “Concurrency Control Performance Modeling: Alternatives and Implications. ” In ACM To. DS, 12(4): 609654, 1987. ê[car 94] M. Carey, et al. “Shoring up persistent applications. ” SIGMOD Record 23(2): 383 -394, 1994. ê[got 92] V. Gottemukkala and T. Lehman, “Locking and Latching in a Memory-Resident Database System. ” In proc. VLDB’ 92. ê[moh 90] C. Mohan, “Commit-LSN: A novel and simple method for reducing locking and latching in transaction processing system. ” In proc. VLDB’ 90. © 2008 Ryan Johnson

Thank You! © 2008 Ryan Johnson

Thank You! © 2008 Ryan Johnson