Multiprocessor synchronization algorithms 20225241 Contention in shared memory

  • Slides: 50
Download presentation
Multiprocessor synchronization algorithms (20225241) Contention in shared memory multiprocessors • Definitions • Lower bound

Multiprocessor synchronization algorithms (20225241) Contention in shared memory multiprocessors • Definitions • Lower bound for consensus • Lower bounds for counters, stacks and queues Lecturer: Danny Hendler

Contention in shared-memory systems Contention: the extent to which processes access the same memory

Contention in shared-memory systems Contention: the extent to which processes access the same memory locations simultaneously When multiple processes simultaneously write to the same memory location, they are being stalled High contention hurts performance!

Memory Stalls & Write-Contention pj p 2 p 1 p 0 variable Stalls# j

Memory Stalls & Write-Contention pj p 2 p 1 p 0 variable Stalls# j 2 1 0 Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously.

Recall the consensus implementation we saw… We use a single object, C, that supports

Recall the consensus implementation we saw… We use a single object, C, that supports the compare&swap and read operations. Initially C=null Decide(v) ; code for pi, i=0, 1 1. 2. CAS(C, null, v) return C What is the write-contention of this algorithm? n It can be shown that this is the writecontention of any consensus algorithm

What can we say about the worst-case time complexity of objects such as counters,

What can we say about the worst-case time complexity of objects such as counters, stacks and queues?

Naïve Counter Implementation FAI 3 1 FAI object FAI 4 2 FAI 6 FAI

Naïve Counter Implementation FAI 3 1 FAI object FAI 4 2 FAI 6 FAI 5 FAI Last processes to succeed incur θ(n) time complexity! Can we do much better?

We will see a time lower bound of √n on non-blocking implementations of: counters,

We will see a time lower bound of √n on non-blocking implementations of: counters, stacks, queues… Any algorithm either (a) suffers high contention or (b) suffers high latency

Capture Influence between processes 3 1 4 2 6 5 Time complexity is determined

Capture Influence between processes 3 1 4 2 6 5 Time complexity is determined by the extent by which operations by different processes influence each other.

Influence-level Each of us may precede you and modify the value you will get!

Influence-level Each of us may precede you and modify the value you will get! Hmmm… I will soon request a value Shared Counter 17 FAI Influence level (w. r. t. p) p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 FAI p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 FAI p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 FAI p

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us

Modifying Steps Hmmm… I will soon request a value Shared Counter Each of us may precede you! q 17 18 FAI There’s an atomic step in which q modifies p’s return value. We bring all the ‘Influencers’ to be on the verge of performing a modifying step p

Space/Write-contention tradeoff • We bring all Influencers to be on the verge of a

Space/Write-contention tradeoff • We bring all Influencers to be on the verge of a modifying step • Each modifying step is necessarily a write/RMW operation Influence-level I S≥ C Write-contention Space complexity

Latency/Contention tradeoff Shared Counter Hmmm… I will soon request a value Process p can

Latency/Contention tradeoff Shared Counter Hmmm… I will soon request a value Process p can be made to read all these variables in the course of its operation! p 17 FAI Base-objects on which there are outstanding modifying steps I LR ≥C # of read base objects Influence-level Write-contention

Time lower bound LRC ≥I Time complexity is at least I

Time lower bound LRC ≥I Time complexity is at least I

Influence(n) Objects Class Definition: The Influence-function, Io(n), of a generic object O, is defined

Influence(n) Objects Class Definition: The Influence-function, Io(n), of a generic object O, is defined as follows: Io(n)= k, if the influence-level of any n-process nonblocking implementation of O is at least k. Definition: Influence(n) is the class of generic objects whose Influence-function is in (n) Influence(n) includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate-agreement…

Concurrent Counter is in Influence(n) Each of us may precede you! Hmmm… I will

Concurrent Counter is in Influence(n) Each of us may precede you! Hmmm… I will soon request a value Shared Counter 17 FAI Influence-level is (n-1): every q≠p can influence p p

Stack is in Influence(n) Each of us may precede you! Hmmm… I will soon

Stack is in Influence(n) Each of us may precede you! Hmmm… I will soon attempt to pop a value. Top of stack n 3 2 1 Influence-level is (n-1), e. g. if every q≠p has a pending pop operation. p

Approximate Agreement is in Influence(n) In approximate agreement, each process proposes its value. •

Approximate Agreement is in Influence(n) In approximate agreement, each process proposes its value. • Validity: Each process must decide on a value that is legal (in the range of proposed values). • Approximate agreement: The values decided by any two processes must be no more than ε apart. P 1 P 2 P 3 P 4 P 5 0 2ε 2ε Pn 2ε Influence-level is (n-1) If p 1 runs first, it must return 0. If it is preceded by an execution where some q≠p 1 terminates, p 1 must return a value no less than ε.

The bound for Influence(n) is tight The First-Generation Problem • Every process calls a

The bound for Influence(n) is tight The First-Generation Problem • Every process calls a First operation once. • We say an operation is in the first generation of execution E if it is not preceded in E by any other operation • All operations not in the first generation of the execution must return false. • In quiescence, at least one operation from the first generation must have returned true. Lemma The First-Generation object is in Influence(n), and for this problem our bound is tight.

An Optimal Implementation for the First Generation Problem Groups of n processes The mark

An Optimal Implementation for the First Generation Problem Groups of n processes The mark array of n multi-reader multi-writer atomic variables

A linear lower bound on the number of Stalls for long-lived objects The following

A linear lower bound on the number of Stalls for long-lived objects The following material is not required for the exam/assignments.

Theorem: Consider any n-process implementation of an obstruction-free counter, then the worstcase number of

Theorem: Consider any n-process implementation of an obstruction-free counter, then the worstcase number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1.

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 3

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 4 3

Worst-case stalls number ≥ n-1 Let O 1 be the first word along p's

Worst-case stalls number ≥ n-1 Let O 1 be the first word along p's path that is written by some other process in any p-free execution There must be such a word. p O 1 2 4 3

Worst-case stalls number ≥ n-1 Let E 1 be an execution that maximizes the

Worst-case stalls number ≥ n-1 Let E 1 be an execution that maximizes the number of processes that are about to write to O 1 over all p-free executions. p O 1 2 4 3

Worst-case stalls number ≥ n-1 If (k 1=n-1) then we are done. Otherwise, we

Worst-case stalls number ≥ n-1 If (k 1=n-1) then we are done. Otherwise, we show that p must access yet another word that may be written by other processes. p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . p O 1 2 4 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . p O 1 2 4 3 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . p O 1 2 4 3 3

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O

Worst-case stalls number ≥ n-1 What happens if p incurs the stalls on O 1? But now the rest of the path may change. . Assume p gets value v p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v {c, …, c+K 1} p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v {c, …, c+K 1}

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We select some process q G 1 {p} We let q perform K 1+1 fetch&increment operations q must write to a word read by p after O 1 p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We select some process q G 1 {p} We let q perform K 1+1 fetch&increment operations q must write to a word read by p after O 1 q p O 1 2 3 4

Worst-case stalls number ≥ n-1 v: the value returned by p if we let

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We let q perform K 1+1 fetch&increment operations q must write to a word read by p after O 1

Worst-case stalls number ≥ n-1 Let O 2 be first word that will be

Worst-case stalls number ≥ n-1 Let O 2 be first word that will be accessed by p after it incurs the K 1 stalls that is written by some process G 1 {p} Let E 2 be an execution that maximizes the number of processes that are about to write to O 2 over all (G 1 {p})-free executions. p O 1 2 3 4

Worst-case stalls number ≥ n-1 Continuing with this construction we get: p |G 2|

Worst-case stalls number ≥ n-1 Continuing with this construction we get: p |G 2| = K 2 O 1 O 2 |Gm | = Km Om

Conclusion: “Naïve ” implementation is best possible! FAI 3 1 FAI object FAI 4

Conclusion: “Naïve ” implementation is best possible! FAI 3 1 FAI object FAI 4 2 FAI 6 FAI 5 FAI