Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU

  • Slides: 22
Download presentation
Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs

Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs

STM is about ease-of-programming and efficiency What is “efficient“ in a concurrent system?

STM is about ease-of-programming and efficiency What is “efficient“ in a concurrent system?

Cost metrics Space: used memory Cheap Advanced garbage-collection Time: the number of reads and

Cost metrics Space: used memory Cheap Advanced garbage-collection Time: the number of reads and writes (per operation) the number of stalls 4

Relaxed memory models Memory is much slower than CPU Read: check the cache ->

Relaxed memory models Memory is much slower than CPU Read: check the cache -> read the memory Write: invalidate the caches -> update the memory To overcome “stalled writes” – reorder operations Reordering may result in inconsistency 5

What is inconsistency? Process P: Process Q: Write(X, 1) Read(Y) Write(Y, 1) Read(X) W(X,

What is inconsistency? Process P: Process Q: Write(X, 1) Read(Y) Write(Y, 1) Read(X) W(X, 1) R(Y) W(X, 1) P Q W(Y, 1) R(X) 6

Possible outcomes Out-of-order P P reads before Q writes Q Q reads after P

Possible outcomes Out-of-order P P reads before Q writes Q Q reads after P writes P reads after Q writes Q reads before P writes 7

Fixing out-of-order Memory fences: read-after-write (RAW) write(X, 1) fence() // enforce the order read(Y)

Fixing out-of-order Memory fences: read-after-write (RAW) write(X, 1) fence() // enforce the order read(Y) W(X, 1) R(Y) P Q W(Y, 1) R(X) 8

Fixing out-of-order Atomic operations: atomic-write-after-read atomic{ read(Y) … write(X, 1) } E. g. ,

Fixing out-of-order Atomic operations: atomic-write-after-read atomic{ read(Y) … write(X, 1) } E. g. , CAS, TAS, Fetch&Add, … RAW/AWAR fences take ~60 RMRs 9

Our result Any concurrent program in a certain class must use RAW/AWARs 10 10

Our result Any concurrent program in a certain class must use RAW/AWARs 10 10

What programs? Concurrent data types: queues, counters, hash tables, trees, … Non-commutative operations Linearizable

What programs? Concurrent data types: queues, counters, hash tables, trees, … Non-commutative operations Linearizable solo-terminating implementations Mutual exclusion 11

Non-commutative operations Operation A is non-commutative if there exists operation B where (applied to

Non-commutative operations Operation A is non-commutative if there exists operation B where (applied to some state): A influences B and B influences A 12

Example: Queue enq(v) – add v to the end of the queue deq() –

Example: Queue enq(v) – add v to the end of the queue deq() – dequeues the item at the head of the queue Q=1; 2 Q. deq(): 1; Q. deq(): 2 vs. Q. deq(): 2; Q. deq(): 1 deq() influence each other Q. enq(3): ok; Q. deq(): 1 vs. Q. deq(): 1; Q. enq(3): ok enq() is commutative 13

Proof sketch A non-commutative operation must write Suppose not deq(): 1 1; 2 w

Proof sketch A non-commutative operation must write Suppose not deq(): 1 1; 2 w there must be a write! 14

Proof sketch Let w be the first write Suppose there are no AWAR A(w)

Proof sketch Let w be the first write Suppose there are no AWAR A(w) - the longest atomic construct containing w deq(): 1 1; 2 w w must be the first base-object event in A(w)! 15

Proof sketch Suppose there are no RAWs deq(): 1 1; 2 A(w) No RAW

Proof sketch Suppose there are no RAWs deq(): 1 1; 2 A(w) No RAW - no difference for deq()! 16

Mutual exclusion Lock() – acquire the lock Unlock() – release the lock (Mutex) No

Mutual exclusion Lock() – acquire the lock Unlock() – release the lock (Mutex) No two process holds the lock at the same time (Deadlock-freedom) If at least one process executes Lock() and no active process fails, at least one process acquires the lock Two Lock() operations influence each other! 17

Our result In any implementation of mutual exclusion or a concurrent data type with

Our result In any implementation of mutual exclusion or a concurrent data type with a noncommutative operation op, a complete execution of op or lock() contains a RAW or AWAR Every successful lock acquire incurs a RAW/AWAR fence 18 18

Why do we care? Hardware design: what primitives must be optimized? API design: returned

Why do we care? Hardware design: what primitives must be optimized? API design: returned values matter Set with add returning fail vs. returning ok Verification – early catch of obviously incorrect algorithm 19

What’s next? Weaker primitives? Idempotent Work Stealing [Michael et al, PPo. PP’ 09 ]

What’s next? Weaker primitives? Idempotent Work Stealing [Michael et al, PPo. PP’ 09 ] Tight lower bounds? How many RAW/AWAR fences are incurred? Other patterns Read-after-read Write-after-write Multi-RAW: write(Xi, 1) collect(X 1, . . , Xn) 20

References H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. Michael, M. Vechev Laws

References H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. Michael, M. Vechev Laws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be Eliminated In POPL 2011 Srivatsan’s talk on STM fence complexity, TR on the way 21

QUESTIONS? 22

QUESTIONS? 22