Multiprocessor synchronization algorithms 20225241 Contention in shared memory

  • Slides: 45
Download presentation
Multiprocessor synchronization algorithms (20225241) Contention in shared memory multiprocessors • Definitions • Lower bound

Multiprocessor synchronization algorithms (20225241) Contention in shared memory multiprocessors • Definitions • Lower bound for consensus • Lower bounds for counters, stacks and queues Lecturer: Danny Hendler

Contention in shared-memory systems Contention: the extent to which processes access the same memory

Contention in shared-memory systems Contention: the extent to which processes access the same memory locations simultaneously When multiple processes simultaneously write to the same memory location, they are being stalled High contention hurts performance!

Memory Stalls & Write-Contention (Model: Dwork, Herlihy, Waarts, 93) pj p 2 p 1

Memory Stalls & Write-Contention (Model: Dwork, Herlihy, Waarts, 93) pj p 2 p 1 p 0 variable Stalls# j 2 1 0 Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously.

Recall the consensus implementation we saw… We use a single object, C, that supports

Recall the consensus implementation we saw… We use a single object, C, that supports the compare&swap and read operations. Initially C=null Decide(v) ; code for pi, i=0, 1 1. 2. CAS(C, null, v) return C What is the write-contention of this algorithm? n It can be shown that this is the write-contention of any wait-free consensus algorithm

What can we say about the worst-case contention-aware time complexity of objects such as

What can we say about the worst-case contention-aware time complexity of objects such as counters, stacks and queues?

Naïve Counter Implementation FAI 3 1 FAI object FAI 4 2 FAI 6 FAI

Naïve Counter Implementation FAI 3 1 FAI object FAI 4 2 FAI 6 FAI 5 FAI Last processes to succeed incur θ(n) time complexity! Can we do much better?

We will see a time lower bound of √n on lock -free implementations of:

We will see a time lower bound of √n on lock -free implementations of: counters, stacks, queues… (Hendler, Shavit, 2003) Var Var Any algorithm either (a) suffers high contention or (b) suffers high latency (step complexity)

Capture Influence between processes 3 1 4 2 6 5 Time complexity is determined

Capture Influence between processes 3 1 4 2 6 5 Time complexity is determined by the extent by which operations by different processes “influence” each other.

Influence-level Each of us may precede you and modify the value you will get!

Influence-level Each of us may precede you and modify the value you will get! Hmmm… I will soon request a value Shared Counter 17 FAI Influence level (w. r. t. p) p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 FAI p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 FAI p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 FAI p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 18 FAI There’s an atomic step in which q modifies p’s soloexecution response We bring all the ‘Influencers’ to be on the verge of performing a modifying step p

Space/Write-contention tradeoff • We bring all Influencers to be on the verge of a

Space/Write-contention tradeoff • We bring all Influencers to be on the verge of a modifying step • Each modifying step is necessarily a write/RMW operation Influence-level I S≥ C Write-contention Space complexity

Latency/Contention tradeoff Shared Counter Hmmm… I will soon request a value Process p can

Latency/Contention tradeoff Shared Counter Hmmm… I will soon request a value Process p can be made to read all these variables in the course of its operation. p 17 FAI Base-objects on which there are outstanding modifying steps I LR ≥C # of read base objects Influence-level Write-contention

Time lower bound LRC ≥I Time complexity is at least Var Var I Var

Time lower bound LRC ≥I Time complexity is at least Var Var I Var

Influence(n) Objects Class • The above lower bound holds for Influence(n) - a large

Influence(n) Objects Class • The above lower bound holds for Influence(n) - a large class of object that includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximateagreement… • It holds also for one-time implementations of these objects. • Finding the tight bound is a challenging open question

A linear lower bound on the number of Stalls for long-lived objects (Fich, Hendler,

A linear lower bound on the number of Stalls for long-lived objects (Fich, Hendler, Shavit, 2005) Metric is slightly different – we count the total number of stalls incured while accessing multiple objects

An implementation is obstruction-free if every process is guaranteed to terminate its operation if

An implementation is obstruction-free if every process is guaranteed to terminate its operation if it runs solo long enough. Theorem: Consider any n-process implementation of an obstruction-free counter, then the worstcase number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1.

Worst-case stalls number ≥ n-1 for any OF counter implementation Start from an initial

Worst-case stalls number ≥ n-1 for any OF counter implementation Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 3

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 4 3

Worst-case stalls number ≥ n-1 Let O 1 be the first word along p's

Worst-case stalls number ≥ n-1 Let O 1 be the first word along p's path that is written by some other process in any p-free execution There must be such a word. p O 1 2 4 3

Worst-case stalls number ≥ n-1 Let E 1 be an execution that maximizes the

Worst-case stalls number ≥ n-1 Let E 1 be an execution that maximizes the number of processes that are about to write to O 1 over all p-free executions. p O 1 2 4 3

Worst-case stalls number ≥ n-1 If (k 1=n-1) then we are done. Otherwise, we

Worst-case stalls number ≥ n-1 If (k 1=n-1) then we are done. Otherwise, we show that p must access yet another word that may be written by other processes. p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . p O 1 2 4 3 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . p O 1 2 4 3 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . Assume p gets value v p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v {c, …, c+K 1} p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v {c, …, c+K 1}

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We select some process q G 1 {p} We let q perform K 1+1 fetch&increment operations q must write to a word read by p after O 1 p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We select some process q G 1 {p} We let q perform K 1+1 fetch&increment operations q must write to a word read by p after O 1 q p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We let q perform K 1+1 fetch&increment operations q must write to a word read by p after O 1

Worst-case stalls number ≥ n-1 Let O 2 be first word that will be

Worst-case stalls number ≥ n-1 Let O 2 be first word that will be accessed by p after it incurs the K 1 stalls that is written by some process G 1 {p} Let E 2 be an execution that maximizes the number of processes that are about to write to O 2 over all (G 1 {p})-free executions. p O 1 2 3 4

Worst-case stalls number ≥ n-1 Continuing with this construction we get: p |G 2|

Worst-case stalls number ≥ n-1 Continuing with this construction we get: p |G 2| = K 2 O 1 O 2 |Gm | = Km Om

Conclusion: “Naïve ” implementation is best possible! (In terms of worst-case execution. ) FAI

Conclusion: “Naïve ” implementation is best possible! (In terms of worst-case execution. ) FAI 3 1 FAI object FAI 4 2 FAI 6 FAI 5 FAI