Introduction Companion slides for The Art of Multiprocessor



























































- Slides: 59
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Moore’s Law Transistor count still rising Clock speed flattening sharply Art of Multiprocessor Programming 2
Why do we care? • Time no longer cures software bloat – The “free ride” is over • When you double your program’s path length – You can’t just wait 6 months – Your software must somehow exploit twice as much concurrency Art of Multiprocessor Programming 3
Nearly Extinct: the Uniprocesor cpu memory Art of Multiprocessor Programming 4
The New Boss: The Multicore Processor (CMP) All on the same chip cache Bus shared memory Art of Multiprocessor Programming Sun T 2000 Niagara 5
Traditional Scaling Process 7 x Speedup 1. 8 x 3. 6 x User code Traditional Uniprocessor Time: Moore’s law Art of Multiprocessor Programming 6
Ideal Scaling Process 7 x Speedup 1. 8 x 3. 6 x User code Multicore Unfortunately, not so simple… Art of Multiprocessor Programming 7
Actual Scaling Process Speedup 1. 8 x 2 x 2. 9 x User code Multicore Parallelization and Synchronization require great care… Art of Multiprocessor Programming 8
Multicore Programming: Course Overview • Fundamentals – Models, algorithms, impossibility • Real-World programming – Architectures – Techniques Art of Multiprocessor Programming 9
Sequential Computation thread memory object Art of Multiprocessor Programming 10
thr ead s Concurrent Computation memory object Art of Multiprocessor Programming 11
Asynchrony • Sudden unpredictable delays – Cache misses (short) – Page faults (long) – Scheduling quantum used up (really long) Art of Multiprocessor Programming 12
Model Summary • Multiple threads – Sometimes called processes • Single shared memory • Objects live in memory • Unpredictable asynchronous delays Art of Multiprocessor Programming 13
Road Map • We are going to focus on principles first, then practice – Start with idealized models – Look at simplistic problems – Emphasize correctness over pragmatism – “Correctness may be theoretical, but incorrectness has practical impact” Art of Multiprocessor Programming 14
Concurrency Jargon • Hardware – Processors • Software – Threads, processes • Sometimes OK to confuse them, sometimes not. Art of Multiprocessor Programming 15
Parallel Primality Testing • Challenge – Print primes from 1 to 1010 • Given – Ten-processor multiprocessor – One thread per processor • Goal – Get ten-fold speedup (or close) Art of Multiprocessor Programming 16
Load Balancing 1 109 2· 109 P 0 P 1 … … 1010 P 9 • Split the work evenly • Each thread tests range of 109 Art of Multiprocessor Programming 17
Procedure for Thread i void prime. Print { int i = Thread. ID. get(); // IDs in {0. . 9} for (j = i*109+1, j<(i+1)*109; j++) { if (is. Prime(j)) print(j); } } Art of Multiprocessor Programming 18
Issues • Higher ranges have fewer primes • Yet larger numbers harder to test • Thread workloads – Uneven – Hard to predict Art of Multiprocessor Programming 19
Issues • Higher ranges have fewer primes • Yet larger numbers harder to test • Thread workloads – Uneven – Hard to predict d e ct e j re • Need dynamic load balancing Art of Multiprocessor Programming 20
Shared Counter 19 18 each thread takes a number 17 Art of Multiprocessor Programming 21
Procedure for Thread i int counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) print(j); } } Art of Multiprocessor Programming 22
Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) Shared counter print(j); object } } Art of Multiprocessor Programming 23
Where Things Reside void prime. Print { int i = Thread. ID. get(); // IDs in {0. . 9} for (j = i*109+1, j<(i+1)*109; j++) { if (is. Prime(j)) print(j); } } Local variables code cache Bus 1 Bus shared memory shared counter Art of Multiprocessor Programming 24
Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) print(j); Stop when every } value taken } Art of Multiprocessor Programming 25
Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) print(j); } } Increment & return each new value Art of Multiprocessor Programming 26
Counter Implementation public class Counter { private long value; public long get. And. Increment() { return value++; } } Art of Multiprocessor Programming 27
Counter Implementation public class Counter { private long value; } public long get. And. Increment() { , d a e r return value++; h t s e d l a g e r } h t r sin t o n f e r K r u O c n o c r ot fo n Art of Multiprocessor Programming 28
Why? int a = 0; void increment() { a += 1; ; } clang -S -c increment. c movl _a(%rip), %eax addl $1, %eax movl %eax, _a(%rip) Art of Multiprocessor Programming 29
What It Means public class Counter { private long value; public long get. And. Increment() { return value++; temp = value; } value = temp + 1; } return temp; Art of Multiprocessor Programming 30
Not so good… Value… 1 2 read 1 write 2 3 read 2 read 1 2 write 3 write 2 time Art of Multiprocessor Programming 31
Is this problem inherent? !! !! write read If we could only glue reads and writes together… Art of Multiprocessor Programming 32
Challenge public class Counter { private long value; public long get. And. Increment() { temp = value; value = temp + 1; return temp; } } Art of Multiprocessor Programming 33
Challenge public class Counter { private long value; public long get. And. Increment() { temp = value; value = temp + 1; return temp; } Make these } steps atomic (indivisible) Art of Multiprocessor Programming 34
Hardware Solution public class Counter { private long value; public long get. And. Increment() { temp = value; value = temp + 1; return temp; } } Read. Modify. Write() instruction 35 Art of Multiprocessor Programming
An Aside: Java™ public class Counter { private long value; public long get. And. Increment() { synchronized { temp = value; value = temp + 1; } return temp; } } Art of Multiprocessor Programming 36
An Aside: Java™ public class Counter { private long value; } public long get. And. Increment() { synchronized { temp = value; value = temp + 1; } return temp; } Synchronized block Art of Multiprocessor Programming 37
An Aside: Java™ public class Counter { private long value; Mutual Exclusion public long get. And. Increment() { synchronized { temp = value; value = temp + 1; } return temp; } } Art of Multiprocessor Programming 38
Why do we care? • We want as much of the code as possible to execute concurrently (in parallel) • A larger sequential part implies reduced performance • Amdahl’s law: this relation is not linear… Art of Multiprocessor Programming 39
Amdahl’s Law Speedup= 1 -thread execution time n-thread execution time Art of Multiprocessor Programming 40
Amdahl’s Law Speedup= – Art of Multiprocessor Programming 41
Amdahl’s Law Parallel fraction Speedup= – Art of Multiprocessor Programming 42
Amdahl’s Law Sequential fraction Speedup= Parallel fraction – Art of Multiprocessor Programming 43
Amdahl’s Law Sequential fraction Speedup= Number of threads Parallel fraction – Art of Multiprocessor Programming 44
Example • Ten processors • 60% concurrent, 40% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 45
Example • Ten processors • 60% concurrent, 40% sequential • How close to 10 -fold speedup? Speedup = 2. 17= Art of Multiprocessor Programming 46
Example • Ten processors • 80% concurrent, 20% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 47
Example • Ten processors • 80% concurrent, 20% sequential • How close to 10 -fold speedup? Speedup = 3. 57= Art of Multiprocessor Programming 48
Example • Ten processors • 90% concurrent, 10% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 49
Example • Ten processors • 90% concurrent, 10% sequential • How close to 10 -fold speedup? Speedup = 5. 26= Art of Multiprocessor Programming 50
Example • Hundred processors • 90% concurrent, 10% sequential • How close to 100 -fold speedup? Art of Multiprocessor Programming 51
Example • Hundred processors • 90% concurrent, 10% sequential • How close to 100 -fold speedup? Speedup = 9. 17= Art of Multiprocessor Programming 52
Example • Ten processors • 99% concurrent, 01% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 53
Example • Ten processors • 99% concurrent, 01% sequential • How close to 10 -fold speedup? Speedup = 9. 17= Art of Multiprocessor Programming 54
Back to Real-World Multicore Scaling Speedup 2 x 1. 8 x User code Multicore Not reducing sequential % of code Art of Multiprocessor Programming 2. 9 x
Shared Data Structures Fine Grained Coarse Grained 25% Shared 75% Unshared
Shared Data Structures Honk! Why only 2. 9 speedup Honk! Fine Grained Coarse Grained 25% Shared 75% Unshared
Shared Data Structures Honk! Why fine-grained parallelism maters Honk! Fine Grained Coarse Grained 25% Shared 75% Unshared
This work is licensed under a Creative Commons Attribution. Share. Alike 2. 5 License. • You are free: – to Share — to copy, distribute and transmit the work – to Remix — to adapt the work • Under the following conditions: – Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). – Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. • For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – http: //creativecommons. org/licenses/by-sa/3. 0/. • Any of the above conditions can be waived if you get permission from the copyright holder. • Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming 59