Load Balancing and Multithreaded Programming Nir Shavit Multiprocessor
Load Balancing and Multithreaded Programming Nir Shavit Multiprocessor Synchronization Spring 2003 M. Herlihy & N. Shavit (c) 2003
How to write Parallel Apps? • Multithreaded Programming – – Programming model Programming language (Cilk) Well-developed theory Successful practice 9/12/2021 M. Herlihy & N. Shavit (c) 2003 2
Why We Care • Interesting in its own right • Scheduler – Ideal application for – Lock-free data structures 9/12/2021 M. Herlihy & N. Shavit (c) 2003 3
Multithreaded Fibonacci int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} *Cilk Code (Java Code in Notes) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 4
Multithreaded Fibonacci int fib(int n) { if (n < 2) { Parallel method call return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 5
Multithreaded Fibonacci int fib(int n) { if (n < 2) { Wait for children to return n; complete } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 6
Multithreaded Fibonacci int fib(int n) { if (n < 2) { Safe to use return n; children’s values } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 7
Note • Spawn & synch operators – Like Israeli traffic signs – Are purely advisory in nature • The scheduler – Like the Israeli driver – Has complete freedom to decide 9/12/2021 M. Herlihy & N. Shavit (c) 2003 8
Dynamic Behavior • Multithreaded program is – A directed acyclic graph (DAG) – That unfolds dynamically • A thread is – Maximal sequence of instructions – Without spawn, sync, or return 9/12/2021 M. Herlihy & N. Shavit (c) 2003 9
Fib DAG fib(4) spawn fib(3) fib(2) fib(1) 9/12/2021 sync fib(1) M. Herlihy & N. Shavit (c) 2003 10
Arrows Reflect Dependencies fib(4) spawn fib(3) fib(2) fib(1) 9/12/2021 sync fib(1) M. Herlihy & N. Shavit (c) 2003 11
How Parallel is That? • Define work: – Total time on one processor • Define critical-path length: – Longest dependency path – Can’t beat that! 9/12/2021 M. Herlihy & N. Shavit (c) 2003 12
Fib Work fib(4) fib(3) fib(2) fib(1) 9/12/2021 fib(1) M. Herlihy & N. Shavit (c) 2003 13
Fib Work 1 4 10 16 9/12/2021 5 11 3 6 12 17 2 7 13 14 8 9 15 work is 17 M. Herlihy & N. Shavit (c) 2003 14
Fib Critical Path fib(4) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 15
Fib Critical Path fib(4) 1 2 3 7 4 6 5 9/12/2021 8 Critical path length is 8 M. Herlihy & N. Shavit (c) 2003 16
Notation Watch • TP = time on P processors • T 1 = work (time on 1 processor) • T∞ = critical path length (time on ∞ processors) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 17
Simple Bounds • TP ≥ T 1/P – In one step, can’t do more than P work • TP ≥ T ∞ – Can’t beat infinite resources 9/12/2021 M. Herlihy & N. Shavit (c) 2003 18
More Notation Watch • Speedup on P processors – Ratio T 1/TP – How much faster with P processors • Linear speedup – T 1/TP = Θ(P) • Max speedup (average parallelism) – T 1/T∞ 9/12/2021 M. Herlihy & N. Shavit (c) 2003 19
Remarks • Graph nodes have out-degree ≤ 2 • Unique – Starting node – Ending node 9/12/2021 M. Herlihy & N. Shavit (c) 2003 20
Matrix Multiplication 9/12/2021 M. Herlihy & N. Shavit (c) 2003 21
Matrix Multiplication • Each n-by-n matrix multiplication – 8 multiplications – 4 additions – Of n/2 -by-n/2 submatrices 9/12/2021 M. Herlihy & N. Shavit (c) 2003 22
Addition int add(Matrix C, Matrix T, int n) { if (n == 1) { C[1, 1] = C[1, 1] + T[1, 1]; } else { partition C, T into half-size submatrices; spawn add(C 11, T 11, n/2); spawn add(C 12, T 12, n/2); spawn add(C 21, T 21, n/2); spawn add(C 22, T 22, n/2) sync(); }} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 23
Addition • Let AP(n) be running time – For n x n matrix – on P processors • For example – A 1(n) is work – A∞(n) is critical path length 9/12/2021 M. Herlihy & N. Shavit (c) 2003 24
Addition • Work is Partition, synch, etc A 1(n) = 4 A 1(n/2) + Θ(1) 4 spawned additions 9/12/2021 M. Herlihy & N. Shavit (c) 2003 25
Addition • Work is A 1(n) = 4 A 1(n/2) + Θ(1) = Θ(n 2) Same as double-loop summation 9/12/2021 M. Herlihy & N. Shavit (c) 2003 26
Addition • Critical Path length is A∞(n) = A∞(n/2) + Θ(1) spawned additions in parallel 9/12/2021 Partition, synch, etc M. Herlihy & N. Shavit (c) 2003 27
Addition • Critical Path length is A∞(n) = A∞(n/2) + Θ(1) = Θ(log n) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 28
Multiplication int mult(Matrix C, Matrix A, Matrix B, int n) { if (n == 1) { C[1, 1] = A[1, 1]·B[1, 1]; } else { allocate temporary n·n matrix T; partition A, B, C, T into half-size submatrices; … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 29
Multiplication (con’t) spawn mult(C 11, A 11, B 11, n/2); spawn mult(C 12, A 11, B 12, n/2); spawn mult(C 21, A 21, B 11, n/2); spawn mult(C 22, A 22, B 12, n/2) spawn mult(T 11, A 11, B 21, n/2); spawn mult(T 12, A 12, B 22, n/2); spawn mult(T 21, A 21, B 21, n/2); spawn mult(T 22, A 22, B 22, n/2) sync(); spawn add(C, T, n); }} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 30
Multiplication • Work is Final addition M 1(n) = 8 M 1(n/2) + A 1(n) 8 spawned mulitplications 9/12/2021 M. Herlihy & N. Shavit (c) 2003 31
Multiplication • Work is M 1(n) = 8 M 1(n/2) + Θ(n 2) = Θ(n 3) Same as serial triple-nested loop 9/12/2021 M. Herlihy & N. Shavit (c) 2003 32
Multiplication • Critical path length is Final addition M∞(n) = M∞(n/2) + A∞(n) Half-size parallel multiplications 9/12/2021 M. Herlihy & N. Shavit (c) 2003 33
Multiplication • Critical path length is M∞(n) = M∞(n/2) + A∞(n) = M∞(n/2) + Θ(log n) = Θ(log 2 n) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 34
Parallelism • M 1(n)/ M∞(n) = Θ(n 3/log 2 n) • To multiply two 1000 x 1000 matrices – 10003/102=107 • Much more than number of processors on any real machine 9/12/2021 M. Herlihy & N. Shavit (c) 2003 35
Shared-Memory Multiprocessors • Parallel applications – Java – Cilk, etc. • Mix of other jobs – All run together – Come & go dynamically 9/12/2021 M. Herlihy & N. Shavit (c) 2003 36
Scheduling • Ideally, – User-level scheduler – Maps threads to dedicated processors • In real life, – User-level scheduler • Maps threads to fixed number of processes – Kernel-level scheduler • Maps processes to dynamic pool of processors 9/12/2021 M. Herlihy & N. Shavit (c) 2003 37
For Example • Initially, – All P processors available for application • Serial computation – – Takes over one processor Leaving P-1 for us Waits for I/O We get that processor back …. 9/12/2021 M. Herlihy & N. Shavit (c) 2003 38
Speedup • Map threads onto P processes • Cannot get P-fold speedup – What if the kernel doesn’t cooperate? • Can try for PA-fold speedup – PA is time-averaged number of processors the kernel gives us 9/12/2021 M. Herlihy & N. Shavit (c) 2003 39
Static Load Balancing 8 7 speedup 6 5 4 8 -processor Sun Ultra Enterprise 5000. 3 2 1 1 9/12/2021 4 8 12 16 processes ideal mm(1024) lu(2048) barnes(16 K, 10) heat(4 K, 512, 100) 20 M. Herlihy & N. Shavit (c) 2003 24 28 32 40
Dynamic Load Balancing 8 7 speedup 6 5 4 8 -processor Sun Ultra Enterprise 5000. 3 2 1 1 9/12/2021 4 8 12 16 ideal mm(1024) lu(2048) barnes(16 K, 10) heat(4 K, 512, 100) msort(32 M) ray() processes 20 M. Herlihy & N. Shavit (c) 2003 24 28 32 41
Scheduling Hierarchy • User-level scheduler – Tells kernel which processes are ready • Kernel-level scheduler – Synchronous (for analysis, not correctness!) – Picks pi threads to schedule at step i – Time-weighted average is: 9/12/2021 M. Herlihy & N. Shavit (c) 2003 42
Greed is Good • Greedy scheduler – Schedules as much as it can – At each time step 9/12/2021 M. Herlihy & N. Shavit (c) 2003 43
Theorem • Greedy scheduler ensures actual time T ≤ T 1/PA + T∞(P-1)/PA 9/12/2021 M. Herlihy & N. Shavit (c) 2003 44
Proof Strategy Bound this! 9/12/2021 M. Herlihy & N. Shavit (c) 2003 45
Put Tokens in Buckets Thread scheduled and executed work 9/12/2021 Thread scheduled but not executed idle M. Herlihy & N. Shavit (c) 2003 46
At the end …. Total #tokens = work 9/12/2021 idle M. Herlihy & N. Shavit (c) 2003 47
At the end …. T 1 tokens work 9/12/2021 idle M. Herlihy & N. Shavit (c) 2003 48
Must Show ≤ T∞(P-1) tokens work 9/12/2021 idle M. Herlihy & N. Shavit (c) 2003 49
Every Move You Make … • Scheduler is greedy • At least one node ready • Number of idle threads in one step – At most pi-1 ≤ P-1 9/12/2021 M. Herlihy & N. Shavit (c) 2003 50
Every Step You Take … • Consider longest path in unexecuted sub-DAG at step i • At least one node in path ready • Length of path shrinks by at least one at each step • Initially, path is T∞ • So there at most T∞ idle steps 9/12/2021 M. Herlihy & N. Shavit (c) 2003 51
Counting Tokens • At most P-1 idle threads per step • At most T∞ steps • So idle bucket contains at most – T∞(P-1) tokens • Both buckets contain – T 1 + T∞(P-1) tokens 9/12/2021 M. Herlihy & N. Shavit (c) 2003 52
Recapitulating 9/12/2021 M. Herlihy & N. Shavit (c) 2003 53
Turns Out • This bound is within a constant factor of optimal • Actual optimal is NP-complete 9/12/2021 M. Herlihy & N. Shavit (c) 2003 54
Work Sharing • Process generates new threads – Migrate them elsewhere – In hopes of balancing the load 9/12/2021 M. Herlihy & N. Shavit (c) 2003 55
Work Stealing • If a process runs out of work • It steals work from another – If everyone busy, no migration – Idle process incurs synchronization cost 9/12/2021 M. Herlihy & N. Shavit (c) 2003 56
Lock-Free Work Stealing • Each process has a pool of ready threads • Remove thread without synchronizing • If you run out of threads, steal someone else’s • Choose victim at random 9/12/2021 M. Herlihy & N. Shavit (c) 2003 57
Work 1 DEQueue threads push. Bottom pop. Bottom 1. Double-Ended Queue 9/12/2021 M. Herlihy & N. Shavit (c) 2003 58
Obtain Work • Obtain work • Run thread until • Blocks or terminates pop. Bottom 9/12/2021 M. Herlihy & N. Shavit (c) 2003 59
New Work • Unblock node • Spawn node push. Bottom 9/12/2021 M. Herlihy & N. Shavit (c) 2003 60
Whatcha Gonna do When the Well Runs Dry? @&%$!! empty 9/12/2021 M. Herlihy & N. Shavit (c) 2003 61
Steal this Thread! pop. Top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 62
Thread DEQueue • Methods – push. Bottom – pop. Top 9/12/2021 Never happen concurrently M. Herlihy & N. Shavit (c) 2003 63
Yield • Processes spin trying to steal, but all DEQueues are empty • Each process yields processor between steal attempts • Gives victims chance to do work 9/12/2021 M. Herlihy & N. Shavit (c) 2003 64
Performance Without Yield 8 7 speedup 6 ideal mm(1024) lu(2048) barnes(16 K, 10) heat(4 K, 512, 100) msort(32 M) ray() 5 4 3 2 1 1 9/12/2021 4 8 12 16 processes 20 M. Herlihy & N. Shavit (c) 2003 24 28 32 65
Ideal • Wait-Free • Linearizable • Constant time Fortune Cookie: “It is better to be young, rich and beautiful, than old, poor, and ugly! 9/12/2021 M. Herlihy & N. Shavit (c) 2003 66
Compromise • Method pop. Top may signal abort if – Concurrent pop. Top succeeds – Concurrent pop. Bottom takes last thread Blame the victim! 9/12/2021 M. Herlihy & N. Shavit (c) 2003 67
Dreaded ABA Problem top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 68
Dreaded ABA Problem top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 69
Dreaded ABA Problem top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 70
Dreaded ABA Problem top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 71
Dreaded ABA Problem top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 72
Dreaded ABA Problem top 9/12/2021 M. Herlihy & N. Shavit (c) 2003 73
Dreaded ABA Problem Yes! CAS top Uh-Oh … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 74
Dreaded ABA Fix tag top bottom 9/12/2021 M. Herlihy & N. Shavit (c) 2003 75
Code public class DEQueue { long. RMWregister top; // tag & top int bottom; // bottom thread index Thread[] deq; // array of threads … } half index & half tag to avoid ABA 9/12/2021 M. Herlihy & N. Shavit (c) 2003 76
Dreaded ABA Problem Fix // extract tag field from top private int TAG_MASK = 0 x. FFFF 0000; private int TAG_SHIFT = 16; private int get. Tag(int i) { return ((i & TAG_MASK) >> TAG_SHIFT); } 0 x 00210032 tag 9/12/2021 index M. Herlihy & N. Shavit (c) 2003 77
Code public class DEQueue { … void push. Bottom(Thread t){ this. deq[this. bottom] = t; this. bottom++; } … } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 78
Code Thread pop. Top() throws Abort { long old. Top = this. top. read(); int bottom = this. bottom; if (bottom < get. Index(old. Top)) // empty return null; Thread t = this. deq[get. Index(old. Top)]; long new. Top = set. Index(old. Top, get. Index(old. Top)+1); if (this. top. CAS(old. Top, new. Top)) return t; throw new Abort(); }…} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 79
Code Thread pop. Top() throws Abort { int old. Top = this. top. read(); int bottom = this. bottom; if (bottom < get. Index(old. Top)) // empty return null; Thread t = this. deq[get. Index(old. Top)]; long new. Top = set. Index(old. Top, get. Index(old. Top)+1); if (this. top. CAS(old. Top, new. Top)) return t; throw new Abort(); }…} Make sure queue non-empty 9/12/2021 M. Herlihy & N. Shavit (c) 2003 80
Code Thread pop. Top() throws Abort { int old. Top = this. top. read(); int bottom = this. bottom; if (bottom < get. Index(old. Top)) // empty return null; Thread t = this. deq[get. Index(old. Top)]; long new. Top = set. Index(old. Top, get. Index(old. Top)+1); if (this. top. CAS(old. Top, new. Top)) return t; throw new Abort(); }…} Get old and new top values 9/12/2021 M. Herlihy & N. Shavit (c) 2003 81
Code Thread pop. Top() throws Abort { int old. Top = this. top; int bottom = this. bottom; if (bottom < get. Index(old. Top)) // empty return null; Thread t = this. deq[get. Index(old. Top)]; int new. Top = old. Top; new. Top = set. Index(old. Top, get. Index(old. Top)+1); if (this. top. CAS(old. Top, new. Top)) return t; throw new Abort(); }…} Install new top value 9/12/2021 M. Herlihy & N. Shavit (c) 2003 82
Code Thread pop. Bottom() { if (this. bottom == 0) return null; this. bottom--; Thread t = this. deq[this. bottom]; long old. Top = this. top. read(); if (this. bottom > get. Index(old. Top)) return t; long new. Top = make. Top(get. Tag(old. Top), 0); this. bottom = 0; if (this. bottom == get. Index(old. Top)) if (this. top. CAS(old. Top, new. Top)) return t; this. top. write(new. Top); return null; } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 83
Code Thread pop. Bottom() { if (this. bottom == 0) return null; this. bottom--; Thread t = this. deq[this. bottom]; long old. Top = this. top. read(); if (this. bottom > get. Index(old. Top)) return t; long new. Top = make. Top(get. Tag(old. Top), 0); this. bottom = 0; if (this. bottom == get. Index(old. Top)) if (this. top. CAS(old. Top, new. Top)) return t; this. top. write(new. Top); return null; } Make sure queue non-empty 9/12/2021 M. Herlihy & N. Shavit (c) 2003 84
Code Thread pop. Bottom() { if (this. bottom == 0) return null; this. bottom--; Thread t = this. deq[this. bottom]; long old. Top = this. top. read(); if (this. bottom > get. Index(old. Top)) return t; long new. Top = make. Top(get. Tag(old. Top), 0); this. bottom = 0; if (this. bottom == get. Index(old. Top)) if (this. top. CAS(old. Top, new. Top)) return t; this. top. write(new. Top); return null; } Grab bottom thread 9/12/2021 M. Herlihy & N. Shavit (c) 2003 85
Code Thread pop. Bottom() { if (this. bottom == 0) return null; this. bottom--; Thread t = this. deq[this. bottom]; long old. Top = this. top. read(); if (this. bottom > get. Index(old. Top)) return t; long new. Top = make. Top(get. Tag(old. Top), 0); this. bottom = 0; if (this. bottom == get. Index(old. Top)) if (this. top. CAS(old. Top, new. Top)) return t; this. top. write(new. Top); return null; } If not near top, we’re done 9/12/2021 M. Herlihy & N. Shavit (c) 2003 86
Code Thread pop. Bottom() { if (this. bottom == 0) return null; this. bottom--; Thread t = this. deq[this. bottom]; long old. Top = this. top. read(); if (this. bottom > get. Index(old. Top)) return t; long new. Top = make. Top(get. Tag(old. Top), 0); this. bottom = 0; if (this. bottom == get. Index(old. Top)) if (this. top. CAS(old. Top, new. Top)) return t; this. top. write(new. Top); return null; } Reset top & bottom 9/12/2021 M. Herlihy & N. Shavit (c) 2003 87
Summary so Far • Multithreaded structures – Work – Critical path length – Parallelism • Scheduling – Work stealing – Lock-free DEQueue 9/12/2021 M. Herlihy & N. Shavit (c) 2003 88
Lock-Free Work Stealing • OK even if the number of processes exceeds the number of processors or when the number of processors grows and shrinks over time. • No need for “non-commercial” operating-system support, such as gang scheduling or process control. 9/12/2021 M. Herlihy & N. Shavit (c) 2003 89
Old English Proverb • “May as well be hanged for stealing a sheep as a goat” • From which we conclude – Stealing was punished severely – Sheep were worth more than goats 9/12/2021 M. Herlihy & N. Shavit (c) 2003 90
But Wait, There’s More! • Stealing is expensive – CAS – Only one thread taken • What if – We could steal more each time? – Say, up to half? 9/12/2021 M. Herlihy & N. Shavit (c) 2003 91
Review • Double-ended queue (DEQueue) • Local thread – Remove/add thread without CAS – If top and bottom > 1 apart 9/12/2021 M. Herlihy & N. Shavit (c) 2003 92
Consensus • If top and bottom are close – Local thread and thief contend – Need consensus to resolve • In a sequence of k pushes or pops – Number of CAS operations is Θ(1) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 93
Consensus • Stealing half increases uncertainty • Consensus on half the queue? • In a sequence of k pushes or pops – Number of CAS operations is Θ(k) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 94
New Idea • We can get down to Θ(log k) • How: limit uncertainty to when queue size passes a power of 2! • Keep a “half-point” counter – Thief resets counter – Local thread changes counter at powerof-2 boundary 9/12/2021 M. Herlihy & N. Shavit (c) 2003 95
Previous-Steal-Range tag The Big Picture top last Up to 2 i can be stolen atomically tag At least 2 i outside steal range top last Bottom somewhere in group of 2 i+1 bottom Steal-range 9/12/2021 M. Herlihy & N. Shavit (c) 2003 96
Steal Range • tag: defeats ABA problem • top: index of topmost item in DEQueue • steal. Last: last item to be stolen tag top last steal. Range 9/12/2021 M. Herlihy & N. Shavit (c) 2003 97
When to Steal? if (should. Balance()) { Process victim = random. Process(); try. To. Steal(victim); } • Steal on empty • Steal probabilistically – Probability decreases as queue increases • Steal when queue size passes threshold 9/12/2021 M. Herlihy & N. Shavit (c) 2003 98
Before Push. Bottom 0 tag top 3 last bottom 9/12/2021 14 M. Herlihy & N. Shavit (c) 2003 99
After Push. Bottom 0 tag top last 7 bottom 15 9/12/2021 M. Herlihy & N. Shavit (c) 2003 100
Update steal. Range boolean update. Steal. Range() { if (size is a power of two || theft occurred) { // Try to update the steal. Range int new. Size = Math. max(1, power of 2 closest to half); long old. Range=this. steal. Range; int tag = get. Tag(old. Range. steal. Range); int top = get. Top(old. Range. steal. Range); long new. Range= make. Steal. Range(tag+1, top+new. Size-1)); boolean ok=this. steal. Range. CAS(old. Range, new. Range); if (ok) this. prev. Steal. Range = new. Range; return ok; } return true; } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 101
Update steal. Range boolean update. Steal. Range() { if (size is a power of two || theft occurred) { // Try to update the steal. Range int new. Size = Math. max(1, power of 2 closest to half); long old. Range=this. steal. Range; int tag = get. Tag(old. Range. steal. Range); int top = get. Top(old. Range. steal. Range); long new. Range= make. Steal. Range(tag+1, top+new. Size-1)); boolean ok = this. steal. Range. CAS(old. Range, new. Range); if (ok) prev. Steal. Range = new. Range; return ok; } Readjust when queue size is return true; power of two } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 102
Update steal. Range boolean update. Steal. Range() { if (size is a power of two || theft occurred) { // Try to update the steal. Range int new. Size = Math. max(1, power of 2 closest to half); long old. Range=this. steal. Range; int tag = get. Tag(old. Range. steal. Range); int top = get. Top(old. Range. steal. Range); long new. Range= make. Steal. Range(tag+1, top+new. Size-1)); boolean ok = this. steal. Range. CAS(old. Range, new. Range); if (ok) prev. Steal. Range = new. Range; return ok; } Readjust when thief has return true; taken some threads } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 103
Update steal. Range boolean update. Steal. Range() { if (size is a power of two || theft occurred) { // Try to update the steal. Range int new. Size = Math. max(1, power of 2 closest to half); long old. Range=this. steal. Range; int tag = get. Tag(old. Range. steal. Range); int top = get. Top(old. Range. steal. Range); long new. Range= make. Steal. Range(tag+1, top+new. Size)); boolean ok = this. steal. Range. CAS(old. Range, new. Range); if (ok) prev. Steal. Range = new. Range; return ok; } New range size is roughly half return true; } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 104
Update steal. Range boolean update. Steal. Range() { if (size is a power of two || theft occurred) { // Try to update the steal. Range int new. Size = Math. max(1, power of 2 closest to half); long old. Range=this. steal. Range; int tag = get. Tag(old. Range. steal. Range); int top = get. Top(old. Range. steal. Range); long new. Range= make. Steal. Range(tag+1, top+new. Size-1)); boolean ok = this. steal. Range. CAS(old. Range, new. Range); if (ok) this. prev. Steal. Range = new. Range; return ok; } Try to update steal. Range return true; to reflect the new size } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 105
Update steal. Range boolean update. Steal. Range() { if (size is a power of two || theft occurred) { // Try to update the steal. Range int new. Size = Math. max(1, power of 2 closest to half); long old. Range=this. steal. Range; int tag = get. Tag(old. Range. steal. Range); int top = get. Top(old. Range. steal. Range); long new. Range= make. Steal. Range(tag+1, top+new. Size-1)); boolean ok=this. steal. Range. CAS(old. Range, new. Range); if (ok) this. prev. Steal. Range = new. Range; return ok; } If update succeeded, save a copy return true; of updated range, to identify future thefts } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 106
push. Bottom Code public void push. Bottom(Thread t, throws Full { if (this. get. Size() == QUEUE_SIZE) throw new Full(); this. deq[this. bottom] = t; this. bottom=(++this. bottom) % QUEUE_SIZE; update. Steal. Range(); } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 107
push. Bottom Code public void push. Bottom(Thread t, throws Full { Thread to push if (this. get. Size() == QUEUE_SIZE) throw new Full(); this. deq[this. bottom] = t; this. bottom=(++this. bottom) % QUEUE_SIZE; update. Steal. Range(); } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 108
push. Bottom Code public void push. Bottom(Thread t, throws Full { if (this. get. Size() == QUEUE_SIZE) throw new Full(); Are we full? this. deq[this. bottom] = t; this. bottom=(++this. bottom) % QUEUE_SIZE; update. Steal. Range(); } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 109
push. Bottom Code public void push. Bottom(Thread t, throws Full { if (this. get. Size() == QUEUE_SIZE) throw new Full(); this. deq[this. bottom] = t; this. bottom=(++this. bottom) % QUEUE_SIZE; update. Steal. Range(); } Push thread 9/12/2021 M. Herlihy & N. Shavit (c) 2003 110
push. Bottom Code public void push. Bottom(Thread t, throws Full { if (this. get. Size() == QUEUE_SIZE) throw new Full(); this. deq[this. bottom] = t; this. bottom=(++this. bottom) % QUEUE_SIZE; update. Steal. Range(); } Update Steal. Range, if required 9/12/2021 M. Herlihy & N. Shavit (c) 2003 111
Before Pop. Bottom 0 tag top last 7 bottom 15 9/12/2021 M. Herlihy & N. Shavit (c) 2003 112
After Pop. Bottom 0 tag top 3 last bottom 9/12/2021 14 M. Herlihy & N. Shavit (c) 2003 113
pop. Bottom (Part One) public Object pop. Bottom() throws Abort { if (this. get. Size() == 0) return null; if (!update. Steal. Range()) throw new Abort(); if (this. bottom == 0) this. bottom = QUEUE_SIZE-1; else --this. bottom; Object t = this. deq[this. bottom]; … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 114
pop. Bottom (Part One) public Object pop. Bottom() throws Abort { if (this. get. Size() == 0) return null; if (!update. Steal. Range()) throw new Abort(); if (this. bottom == 0) this. bottom = QUEUE_SIZE-1; else --this. bottom; Object t = this. deq[this. bottom]; … Bail if queue is empty 9/12/2021 M. Herlihy & N. Shavit (c) 2003 115
pop. Bottom (Part One) public Object pop. Bottom() throws Abort { if (this. get. Size() == 0) return null; if (!update. Steal. Range()) throw new Abort(); Panic if unable to fix steal. Range if (this. bottom == 0) this. bottom = QUEUE_SIZE-1; else --this. bottom; Object t = this. deq[this. bottom]; … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 116
pop. Bottom (Part One) public Object pop. Bottom() throws Abort { if (this. get. Size() == 0) return null; if (!update. Steal. Range()) throw new Abort(); Tentatively pop a thread if (this. bottom == 0) this. bottom = QUEUE_SIZE-1; else --this. bottom; Object t = this. deq[this. bottom]; … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 117
pop. Bottom (Part Two) public Object pop. Bottom() throws Abort { … long old. Steal. Range = this. steal. Range; int range. Top = get. Top(old. Steal. Range); int range. Bot = get. Last(old. Steal. Range); if (range. Bot == EMPTY) { this. bottom = 0; // last thread already stolen return null; } else if (this. bottom != range. Bot) return t; // no need to synchronize else { … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 118
pop. Bottom (Part Two) public Object pop. Bottom() throws Abort { … long old. Steal. Range = this. steal. Range; int range. Top = get. Top(old. Steal. Range); int range. Bot = get. Last(old. Steal. Range); if (range. Bot == EMPTY) { this. bottom = 0; // last thread already stolen return null; } else if (this. bottom != range. Bot) return t; // no need to synchronize else { … 9/12/2021 Deconstruct steal. Range M. Herlihy & N. Shavit (c) 2003 119
pop. Bottom (Part Two) public Object pop. Bottom() throws Abort { … long old. Steal. Range = this. steal. Range; int range. Top = get. Top(old. Steal. Range); int range. Bot = get. Last(old. Steal. Range); if (range. Bot == EMPTY) { this. bottom = 0; // last thread already stolen return null; } else if (this. bottom != range. Bot) return t; // no need to synchronize If queue is empty, start over else { … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 120
pop. Bottom (Part Two) public Object pop. Bottom() throws Abort { … long old. Steal. Range = this. steal. Range; int range. Top. If = tentatively-popped get. Top(old. Steal. Range); thread not in int range. Bot = get. Last(old. Steal. Range); steal. Range - no need to synchronize if (range. Bot == null) { this. bottom = 0; // last thread already stolen return null; } else if (this. bottom != range. Bot) return t; // no need to synchronize else { … 9/12/2021 M. Herlihy & N. Shavit (c) 2003 121
pop. Bottom (Part Three) public Object pop. Bottom() throws Abort { … } else { // Try to make steal. Range empty int range. Tag = get. Tag(old. Steal. Range); if (this. steal. Range. CAS(old. Steal. Range, make. Steal. Range(tag+1, 0, EMPTY))) { this. bottom=0; return t; // thread not stolen yet } else { this. bottom=0 return null; // thread stolen }}} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 122
pop. Bottom (Part Three) public Object pop. Bottom() throws Abort { … } else { // Try to make steal. Range empty int range. Tag = get. Tag(old. Steal. Range); if (this. steal. Range. CAS(old. Steal. Range, make. Steal. Range(tag+1, 0, EMPTY))) { this. bottom=0; return t; // thread not stolen yet } else { this. bottom=0 has at most one thread return null; Queue // thread stolen }}} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 123
pop. Bottom (Part Three) public Object pop. Bottom() throws Abort { … } else { // Try to make steal. Range empty int range. Tag = get. Tag(old. Steal. Range); if (this. steal. Range. CAS(old. Steal. Range, make. Steal. Range(tag+1, 0, EMPTY))) { this. bottom=0; return t; // thread not stolen yet } else { this. bottom=0 return null; // thread stolen Try to zero out steal range }}} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 124
pop. Bottom (Part Three) public Object pop. Bottom() throws Abort { … } else { // Try to make steal. Range empty int range. Tag = get. Tag(old. Steal. Range); if (this. steal. Range. CAS(old. Steal. Range, make. Steal. Range(tag+1, 0, EMPTY))) { this. bottom=0; return t; // thread not stolen yet } else { this. bottom=0 we succeeded – the thread is ours! return If null; // thread stolen }}} (and the deque is now empty) 9/12/2021 M. Herlihy & N. Shavit (c) 2003 125
pop. Bottom (Part Three) public Object pop. Bottom() throws Abort { … } else { // Try to make steal. Range empty int range. Tag = get. Tag(old. Steal. Range); if If (this. steal. Range. CAS(old. Steal. Range, we failed – our last thread was stolen make. Steal. Range(tag+1, 0, EMPTY))) { this. bottom=0; return t; // thread not stolen yet } else { this. bottom=0 return null; // thread stolen }}} 9/12/2021 M. Herlihy & N. Shavit (c) 2003 126
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { int num. To. Steal = diff/2 for (int i = 0; i < num. To. Steal; i++) this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 127
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { Victim DEQueue int num. To. Steal = diff/2 for (int i = 0; i < num. To. Steal; i++) this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 128
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { The number of threads int num. To. Steal = diff/2 Actually stolen for (int i = 0; i < num. To. Steal; i++) this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 129
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { int num. To. Steal = diff/2 Deconstruct victim’s for (int i = 0; i < num. To. Steal; i++) steal range this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 130
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { int num. To. Steal = diff/2 Compute length of victim’s steal. Range, (victim’s for (int i = 0; i < num. To. Steal; i++) DEQueue length is at least twice as= much) this. deq[this. bottom+i % QUEUE_SIZE] victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 131
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { Diff is a minimal bound on the difference int num. To. Steal = diff/2 for (intlengths i = 0; i < num. To. Steal; between victimi++) and thief this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 in 132
steal. Top (Part One) public int steal. Top(EDEQueue victim, int thief. Len) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); int deq. Bot = victim. bot; int range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – thief. Len; if (diff <= 1) return 0; else { int = diff/2 by stealing – don’t steal!! If num. To. Steal we can’t equalize for (int i = 0; i < num. To. Steal; i++) this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 133
steal. Top (Part One) public int steal. Top(EDEQueue victim) { long old. Steal. Range = victim. steal. Range; int old. Last = get. Last(old. Steal. Range); int old. Top = get. Top(old. Steal. Range); int old. Tag = get. Tag(old. Steal. Range); Try to steal half the guaranteed difference: int deq. Bot = victim. bot; threads-to-be-stolen to thief’s deque int Copy range. Len = old. Steal. Range. get. Size(); int diff = 2*range. Len – this. deq. length; if (diff <= 1) return 0; else { int num. To. Steal = diff/2 for (int i = 0; i < num. To. Steal; i++) this. deq[this. bottom+i % QUEUE_SIZE] = victim. deq[old. Top+i % QUEUE_SIZE]; }… 9/12/2021 M. Herlihy & N. Shavit (c) 2003 134
steal. Top (Part Two) public int steal. Top(EDEQueue victim) { … int new. Range. Len= max(1, power of 2 closest to half the remaining threads); new. Top = (old. Top+num. To. Steal) % DEQUE_SIZE; new. Last = (new. Top + new. Range. Len – 1) % DEQUE_SIZE; long new. Range = make. Steal. Range(old. Tag+1, new. Top, new. Last); if (victim. steal. Range. CAS(old. Steal. Rnage, new. Range)) { this. bottom = (this. bottom + num. To. Steal) % DEQUE_SIZE; this. update. Steal. Range(); return num. To. Steal; } return 0; } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 135
steal. Top (Part Two) public int steal. Top(EDEQueue victim) { … int new. Range. Len= max(1, power of 2 closest to half the remaining threads); new. Top = (old. Top+num. To. Steal) % DEQUE_SIZE; new. Last = (new. Top + new. Range. Len – 1) % DEQUE_SIZE; long new. Range = make. Steal. Range(old. Tag+1, new. Top, new. Last); if The (victim. steal. Range. CAS(old. Steal. Rnage, new. Range)) { new length of the victim’s steal. Range is about this. bottom = (this. bottom + num. To. Steal) DEQUE_SIZE; half the remaining number of% threads this. update. Steal. Range(); return num. To. Steal; } return 0; } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 136
steal. Top (Part Two) public int steal. Top(EDEQueue victim) { … int new. Range. Len= max(1, power of 2 closest to half the remaining threads); new. Top = (old. Top+num. To. Steal) % DEQUE_SIZE; new. Last = (new. Top + new. Range. Len – 1) % DEQUE_SIZE; long new. Range = make. Steal. Range(old. Tag+1, new. Top, new. Last); if (victim. steal. Range. CAS(old. Steal. Rnage, new. Range)) { this. bottom = (this. bottom + num. To. Steal) % DEQUE_SIZE; this. update. Steal. Range(); return num. To. Steal; } return 0; Try to update victim’s steal. Range to reflect the } theft and the new range length 9/12/2021 M. Herlihy & N. Shavit (c) 2003 137
steal. Top (Part Two) public int steal. Top(EDEQueue victim) { … If succeeded, update thief’s bottom and steal. Range int new. Range. Len= tomax(1, power include new return stolen threads of threads, 2 closest and to half the # remaining threads); new. Top = (old. Top+num. To. Steal) % DEQUE_SIZE; new. Last = (new. Top + new. Range. Len – 1) % DEQUE_SIZE; long new. Range = make. Steal. Range(old. Tag+1, new. Top, new. Last); if (victim. steal. Range. CAS(old. Steal. Rnage, new. Range)) { this. bottom = (this. bottom + num. To. Steal) % DEQUE_SIZE; this. update. Steal. Range(); return num. To. Steal; } return 0; } 9/12/2021 M. Herlihy & N. Shavit (c) 2003 138
Details • Works even if someone steals from thief – Thief may fail to update own steal. Range – But will still update bottom, making theft happen 9/12/2021 M. Herlihy & N. Shavit (c) 2003 139
Big Picture • This code steals as much as it can • More sensible to – Split the difference? – May depend on stealing strategy 9/12/2021 M. Herlihy & N. Shavit (c) 2003 140
Vulnerability • If queue size hovers around power of 2, performance will be lousy • Extra credit – Can we avoid this problem? 9/12/2021 M. Herlihy & N. Shavit (c) 2003 141
Conclusions • “Boutique” lock-free structures – Not general purpose – Customized for work-stealing • Non-trivial correctness issues 9/12/2021 M. Herlihy & N. Shavit (c) 2003 142
Alternative: Gang Scheduling processor 1 processor 2 processor 3 processor 4 time Bad Example: 4 -process computation with 1 process computation on 4 -processor machine. 9/12/2021 Good Example: Dataparallel programs with large working sets. M. Herlihy & N. Shavit (c) 2003 143
Alternative: Process Control process killed new process created processor 1 processor 2 processor 3 processor 4 time Each computation creates and kills processes dynamically to equal number of processors assigned to it. 9/12/2021 M. Herlihy & N. Shavit (c) 2003 144
Clip Art 9/12/2021 M. Herlihy & N. Shavit (c) 2003 145
TOM MA R V O L O R I D DL E 9/12/2021 M. Herlihy & N. Shavit (c) 2003 146
- Slides: 146