Hybrid Transactional Memory Nir Shavit MIT and TelAviv
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Haswell
Transactional Memory [Herlihy. Moss 93]
Transactional Memory • Memory Transactions are collections of reads and writes executed atomically • Should Provide – Disjoint Access Parallelism • Should maintain internal and external consistency – External (Serializability): with respect to the interleavings of other transactions. – Internal (Opacity): the transaction itself should operate on a consistent state.
External Consistency 0 X 0 Y Transaction A: Read y Write x = 4 Return x+y Transaction B: Read x Write y = 4 Return x+y Cannot both return 4 Application Memory Canonical synchronization problem all STM/HTM implementations must solve
Locking STMs Map Array of Versioned. Write-Locks V# Application Memory
Commit Time Locking (Write Buff) Mem Locks X X V# V#+1 V# V# V#+1 0 00 0 0 10 Y Y V# V#+1 V# V# 00 10 0 0 V# V# V# 00 0 1. To Read/Write: Check unlocked add to Read/Write set 2. Acquire Locks 3. Validate read/write v#’s unchanged 4. Write Values 5. Release each lock with v#+1 Read/Write Lock Validate Write Unlock
Internal Inconsistency (Opacity) [Guerraoui. Kapalka 07] 4 8 X Transaction A: Write x = 4 4 Y Transaction A: Write y = 2 Transaction B: Read x Read y Compute z = 1/(x-y) DIV by 0 ERROR!
TL 2/Tiny. STM’s Global Clock [Dice. Shalev. Shavit 06/Reigel. Felber. Fetzer 06] • Have a shared global version clock • Incremented by writing transactions (as infrequently as possible) • Read by all transactions • Used to validate state viewed by transaction is always opaque
TL 2 Style STM Mem X X Y Y 121 120 100 Locks 87 87 87 121 34 34 121 88 88 00 0 0 00 1 0 0 V# 121 99 121 44 44 0 0 10 0 0 50 V# 50 0 Read Clock VClock 1. Read Vclock 2. Read/Write: if unlocked and v# less clock add to Read/Write-Set 3. Acquire Locks 4. Increment Clock 5. Validate each v# less than clock 6. Write values 7. Release locks with v# = new clock Read/Write Lock Inc Validate Write Unlock
TL 2 Style STM • Advantages – Great Disjoint Access Parallelism • Disadvantages – Accessing Meta-Data is Expensive – Progress guarantee is only deadlock freedom
NOrec STM [Dalessandro. Spear. Scott 10] • Use shared global clock as a seqlock • Validation in every read if a seqlock change is detected • Value-based validation: no need for meta-data (local time stamps or locks)
NOrec STM R/W Set X =X ZZ Z Y =Y 101 100 103 102 104 Read/Write (with validation if Not odd? seqlock changed) seqlock Lock seqlock (set odd) with validation if seqlock changed Write Unlock seqlock (set even)
NOrec STM • Advantages – No Expensive Meta-Data • Disadvantages – Poor Disjoint Access Parallelism (all writes are serialized by clock) – Progress guarantee is only starvation freedom
Hardware TM [Herlihy. Moss 93, IBM/Intel 13] • Advantages – Everything in Hardware, No Meta Data – Great Disjoint Access Parallelism • Disadvantages – No Progress Guarantee; Fail because of: • Unsupported instructions: system or protected instructions • Exceptions: page faults and similar • Capacity limit: too many accessed locations
Hybrid TM [Moir, Damron et. Al, Kumar et. al] • Fast-Path: Execute Trans Using Best Effort HTM – If it Aborts because of Special Instructions or Transaction Too Large, then… • Slow-Path: Execute Trans Using STM Performance of HTM with progress guarantee of STM
Traditional Hybrid TM [Damron. Fedorova. Lev. Luchangco. Moir. Nussbaum 06] Hardware Transaction Test Versioned. Write. Lock in every Read/Write. Update in Write. 0 1 Versioned. Write-Lock Software Transaction Update locks 0 1 Versioned. Write-Lock
Traditional Hybrid TM • Advantages – Progress Guarantee of STM • Disadvantages – HTM must access meta data – Fast path is actually slow because of extra load and branch on every read
Traditional Hybrid TM
Phased TM [Lev. Moir. Nussbaum 07] • Two modes: all hardware or all software • Shared global mode indicator • If some hardware transaction aborts switch to software mode • Eventually mode reverts back to hardware
Phased TM • Advantages – Fast-path Pure HTM: No Meta Data Accesses • Disadvantages – Single Software Transaction Causes all HTM to switch to STM slow path – Not clear how to tune to avoid frequent mode transitions…
Hybrid Norec (1 st Attempt) Software Norec: Read/Write Not odd? (with seqlock validation) Hardware: Unlock Lock Seqlock (set odd) Validate Write (set even) Software will fail seqlock validation! Read/Write (no validation) Write Not odd? seqlock +2
Hybrid Norec (1 st Attempt) Software Norec: Read/Write Not odd? (with seqlock validation) Hardware: Lock Unlock Seqlock (set odd) Validate Write (set even) Hardware will fail seqlock validation! Write Not odd? seqlock Read/Write (no validation) seqlock +2
Hybrid Norec (1 st Attempt) Software Norec: Odd? seqlock Hardware: Guaranteed External Consistency Read/Write (with validation) Lock Unlock Seqlock (set odd) Validate Write (set even) Hardware will fail seqlock validation! Write Not odd? seqlock Read/Write (no validation) seqlock +2
Hybrid Norec (1 st Attempt) Software Norec: Problem: hardware opacity Read/Write Not odd? (with seqlock validation) Hardware: Lock Unlock Seqlock (set odd) Validate Write (set even) Hardware will fail seqlock validation! Write Not odd? seqlock Read/Write (no validation) seqlock +2
Internal Inconsistency (Opacity) [Guerraoui. Kapalka 07] 4 8 X 4 Y Software A: Lock seqlock +1 Write x = 4 Write y = 2 Unlock seqlock+1 Hardware B: Read x Read y Compute z = 1/(x-y) … Odd? Seqlock DIV by 0 ERROR!
Hybrid Norec (2 nd Attempt) Software Norec: Guarantee hardware opacity Read/Write Not odd? (with seqlock validation) Hardware: Lock Unlock Seqlock (set odd) Validate Write (set even) Hardware will detect seqlock invalidation! Read/Write (no validation) Write Not odd? seqlock +2
Hybrid NOrec • Advantages – Fast-path HTM: No Meta Data Accesses • Disadvantages – Limited Disjoint Access Parallelism – Seqlock is in hardware tracking set throughout HTM transaction – Major sequential bottleneck
Possible Solutions • Forget Opacity, Use sandboxing [Dalessandro. Carouge. White. Lev. Moir. Sc ott. Spear 2011] • Hybrid Norec 2 [Riegel. Marlier. Nowack. Felber. Fetzer 11]: use non-transactional operations in a hardware transaction to read and But sandboxing is complex…and nonvalidate seqlock has not changed after transactional ops only available in AMD every read proposal, not actual IBM or Intel …
Reduced Hardware Approach to Hy. TM [Matveev. Shavit 13] • Use short hardware transactions in the software slow-path • I. e. create new “mixed” software/hardware path • Not in order to make slow-path faster – But rather, in order to remove meta-data accesses from fast path • Default to all software if mixed path fails
Transactional Writes Imply Hardware Opacity 4 8 X Trans A: Write x = 4 Hardware B: Read x Read y Compute z = 1/(x-y) 4 Y 2 Write y = 2 DIV by 0 ERROR! If in a hardware transaction this cannot happen…
Reduced Hardware NOrec [Matveev. Shavit 13] • In Slow-path commit, use a small hardware transaction to: – Write all values – Check seqlock has not changed – Write seqlock+1 • In Fast-path: – Move seqlock test to end, un-instrumented read/writes
Reduced Hardware NOrec Software Norec: Guarantee fast-path opacity without having seqlock TMTrans: In in HTM Lock tracking set for long Write values Lock Read/Write Changed? seqlock Changed? (with Write+1(set even) seqlock validation) (set odd) Validate seqlock Hardware: Changed? seqlock Hardware will detect write conflict without seqlock! Write Read seqlock +1 Read/Write (no instrumentation)
Reduced Hardware NOrec • Properties – Fast-path: No Meta Data; No instrumentation of reads or writes – Slow-path: – short hardware transaction: size of write set – can repeatedly attempt short hardware transaction in commit
Reduced Hardware NOrec • Advantages – Hardware Disjoint Access Parallelism – seqlock accessed only at end of HTM transaction – Surprise: 1 st Hy. TM that is Obstruction-free and Privatizing – Disadvantages – Still window of possible abort due to seqlock increment
Reduced Hardware NOrec
Reduced Hardware NOrec
Reduced Hardware TL 2 Style Hardware Will See Software TL 2 style: Read Clock Read/Write (validate) Hardware: In HTM Trans: Write Validate Write values Hardware will detect write conflict Read/Write (no validation) Read Write values Clock With Clock +1
Problem: if between validate Reduced Hardware TL 2 Style and hardware write, can Solution: have combine validation inconsistency Software TL 2 style: and writes in single transaction Read Clock Read/Write (validate) Hardware: In HTM Trans: Validate Inand HTM Trans: Write values Validate Write values Hardware will detect write conflict Read/Write (no validation) Read Write values Clock With Clock +1
Reduced Hardware TL 2 Style • Advantages – Complete Disjoint Access Parallelism – GV 6 clock incremented on aborts only – Obstruction-free – Disadvantages – No privatization – Mixed path transaction size of meta-data set
RH 1: Reduced Hardware TL 2 Style
RH 1: Reduced Hardware TL 2 Style
Hy. TM: Long Journey • Combination of ideas: – hardware transactions, – global clocks, – no meta data access, – mixed hardware software paths • And there is still room for improvement
- Slides: 43