Introduction Companion slides for The Art of Multiprocessor

  • Slides: 58
Download presentation
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir

Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Moore’s Law Transistor count still rising Clock speed flattening sharply Art of Multiprocessor Programming

Moore’s Law Transistor count still rising Clock speed flattening sharply Art of Multiprocessor Programming 2

Still on some of your desktops: The Uniprocesor cpu memory Art of Multiprocessor Programming

Still on some of your desktops: The Uniprocesor cpu memory Art of Multiprocessor Programming 3

In the Enterprise: The Shared Memory Multiprocessor (SMP) cache Bus shared memory Art of

In the Enterprise: The Shared Memory Multiprocessor (SMP) cache Bus shared memory Art of Multiprocessor Programming 4

Your New Desktop: The Multicore Processor (CMP) All on the same chip cache Bus

Your New Desktop: The Multicore Processor (CMP) All on the same chip cache Bus Sun T 2000 Niagara cache Bus shared memory Art of Multiprocessor Programming 5

Multicores Are Here • “Intel's Intel ups ante with 4 -core chip. New microprocessor,

Multicores Are Here • “Intel's Intel ups ante with 4 -core chip. New microprocessor, due this year, will be faster, use less electricity. . . ” [San Fran Chronicle] • “AMD will launch a dual-core version of its Opteron server processor at an event in New York on April 21. ” [PC World] • “Sun’s Niagara…will have eight cores, each core capable of running 4 threads in parallel, for 32 concurrently running threads. …. ” [The Inquierer] Art of Multiprocessor Programming 6

Why do we care? • Time no longer cures software bloat – The “free

Why do we care? • Time no longer cures software bloat – The “free ride” is over • When you double your program’s path length – You can’t just wait 6 months – Your software must somehow exploit twice as much concurrency Art of Multiprocessor Programming 7

Traditional Scaling Process 7 x Speedup 1. 8 x 3. 6 x User code

Traditional Scaling Process 7 x Speedup 1. 8 x 3. 6 x User code Traditional Uniprocessor Time: Moore’s law Art of Multiprocessor Programming 8

Multicore Scaling Process 7 x Speedup 1. 8 x 3. 6 x User code

Multicore Scaling Process 7 x Speedup 1. 8 x 3. 6 x User code Multicore Unfortunately, not so simple… Art of Multiprocessor Programming 9

Real-World Scaling Process Speedup 1. 8 x 2 x 2. 9 x User code

Real-World Scaling Process Speedup 1. 8 x 2 x 2. 9 x User code Multicore Parallelization and Synchronization require great care… Art of Multiprocessor Programming 10

Multicore Programming: Course Overview • Fundamentals – Models, algorithms, impossibility • Real-World programming –

Multicore Programming: Course Overview • Fundamentals – Models, algorithms, impossibility • Real-World programming – Architectures – Techniques Art of Multiprocessor Programming 11

Multicore Programming: Course Overview y l i r a s s – Models, algorithms,

Multicore Programming: Course Overview y l i r a s s – Models, algorithms, impossibility e c e n t e • Real-World programming ’ k n o ma d … e – Architectures o s W nt t ert – Techniques wa exp u o y • Fundamentals Art of Multiprocessor Programming 12

Sequential Computation thread memory object Art of Multiprocessor Programming 13

Sequential Computation thread memory object Art of Multiprocessor Programming 13

thr ead s Concurrent Computation memory object Art of Multiprocessor Programming 14

thr ead s Concurrent Computation memory object Art of Multiprocessor Programming 14

Asynchrony • Sudden unpredictable delays – Cache misses (short) – Page faults (long) –

Asynchrony • Sudden unpredictable delays – Cache misses (short) – Page faults (long) – Scheduling quantum used up (really long) Art of Multiprocessor Programming 15

Model Summary • Multiple threads – Sometimes called processes • Single shared memory •

Model Summary • Multiple threads – Sometimes called processes • Single shared memory • Objects live in memory • Unpredictable asynchronous delays Art of Multiprocessor Programming 16

Road Map • We are going to focus on principles first, then practice –

Road Map • We are going to focus on principles first, then practice – – Start with idealized models Look at simplistic problems Emphasize correctness over pragmatism “Correctness may be theoretical, but incorrectness has practical impact” Art of Multiprocessor Programming 17

Concurrency Jargon • Hardware – Processors • Software – Threads, processes • Sometimes OK

Concurrency Jargon • Hardware – Processors • Software – Threads, processes • Sometimes OK to confuse them, sometimes not. Art of Multiprocessor Programming 18

Parallel Primality Testing • Challenge – Print primes from 1 to 1010 • Given

Parallel Primality Testing • Challenge – Print primes from 1 to 1010 • Given – Ten-processor multiprocessor – One thread per processor • Goal – Get ten-fold speedup (or close) Art of Multiprocessor Programming 19

Load Balancing 1 109 2· 109 P 0 P 1 1010 … … •

Load Balancing 1 109 2· 109 P 0 P 1 1010 … … • Split the work evenly • Each thread tests range of 109 Art of Multiprocessor Programming 20 P 9

Procedure for Thread i void prime. Print { int i = Thread. ID. get();

Procedure for Thread i void prime. Print { int i = Thread. ID. get(); // IDs in {0. . 9} for (j = i*109+1, j<(i+1)*109; j++) { if (is. Prime(j)) print(j); } } Art of Multiprocessor Programming 21

Issues • Higher ranges have fewer primes • Yet larger numbers harder to test

Issues • Higher ranges have fewer primes • Yet larger numbers harder to test • Thread workloads – Uneven – Hard to predict Art of Multiprocessor Programming 22

Issues • Higher ranges have fewer primes • Yet larger numbers harder to test

Issues • Higher ranges have fewer primes • Yet larger numbers harder to test • Thread workloads d – Uneven – Hard to predict e t c e j re • Need dynamic load balancing Art of Multiprocessor Programming 23

Shared Counter 19 18 each thread takes a number 17 Art of Multiprocessor Programming

Shared Counter 19 18 each thread takes a number 17 Art of Multiprocessor Programming 24

Procedure for Thread i int counter = new Counter(1); void prime. Print { long

Procedure for Thread i int counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) print(j); } } Art of Multiprocessor Programming 25

Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long

Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) Shared counter print(j); object } } Art of Multiprocessor Programming 26

Where Things Reside void prime. Print { int i = Thread. ID. get(); //

Where Things Reside void prime. Print { int i = Thread. ID. get(); // IDs in {0. . 9} for (j = i*109+1, j<(i+1)*109; j++) { if (is. Prime(j)) print(j); } } Local variables code cache Bus shared memory 1 shared counter Art of Multiprocessor Programming 27

Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long

Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long j = 0; Stop when every while (j < 1010) { j = counter. get. And. Increment(); value taken if (is. Prime(j)) print(j); } } Art of Multiprocessor Programming 28

Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long

Procedure for Thread i Counter counter = new Counter(1); void prime. Print { long j = 0; while (j < 1010) { j = counter. get. And. Increment(); if (is. Prime(j)) print(j); } } Increment & return each new value Art of Multiprocessor Programming 29

Counter Implementation public class Counter { private long value; public long get. And. Increment()

Counter Implementation public class Counter { private long value; public long get. And. Increment() { return value++; } } Art of Multiprocessor Programming 30

Counter Implementation public class Counter { private long value; } public long get. And.

Counter Implementation public class Counter { private long value; } public long get. And. Increment() {d, a e r h return value++; s t d e a l e g r n i h s } t or nt OK f oncurre c r o f t o n Art of Multiprocessor Programming 31

What It Means public class Counter { private long value; public long get. And.

What It Means public class Counter { private long value; public long get. And. Increment() { return value++; } } Art of Multiprocessor Programming 32

What It Means public class Counter { private long value; public long get. And.

What It Means public class Counter { private long value; public long get. And. Increment() { return value++; temp = value; } value = value + 1; } return temp; Art of Multiprocessor Programming 33

Not so good… Value… 1 2 read write 1 2 3 read 2 write

Not so good… Value… 1 2 read write 1 2 3 read 2 write 3 read 1 write 2 time Art of Multiprocessor Programming 2 34

Is this problem inherent? write read If we could only glue reads and writes…

Is this problem inherent? write read If we could only glue reads and writes… Art of Multiprocessor Programming 35

Challenge public class Counter { private long value; public long get. And. Increment() {

Challenge public class Counter { private long value; public long get. And. Increment() { temp = value; value = temp + 1; return temp; } } Art of Multiprocessor Programming 36

Challenge public class Counter { private long value; public long get. And. Increment() {

Challenge public class Counter { private long value; public long get. And. Increment() { temp = value; value = temp + 1; return temp; Make these } } steps atomic (indivisible) Art of Multiprocessor Programming 37

Hardware Solution public class Counter { private long value; public long get. And. Increment()

Hardware Solution public class Counter { private long value; public long get. And. Increment() { temp = value; value = temp + 1; return temp; } } Read. Modify. Write() instruction Art of Multiprocessor 38 Programming

An Aside: Java™ public class Counter { private long value; public long get. And.

An Aside: Java™ public class Counter { private long value; public long get. And. Increment() { synchronized { temp = value; value = temp + 1; } return temp; } } Art of Multiprocessor Programming 39

An Aside: Java™ public class Counter { private long value; } public long get.

An Aside: Java™ public class Counter { private long value; } public long get. And. Increment() { synchronized { temp = value; value = temp + 1; } return temp; } Synchronized block Art of Multiprocessor Programming 40

An Aside: Java™ public class Counter { private long value; Mutual Exclusion public long

An Aside: Java™ public class Counter { private long value; Mutual Exclusion public long get. And. Increment() { synchronized { temp = value; value = temp + 1; } return temp; } } Art of Multiprocessor Programming 41

Why do we care? • We want as much of the code as possible

Why do we care? • We want as much of the code as possible to execute concurrently (in parallel) • A larger sequential part implies reduced performance • Amdahl’s law: this relation is not linear… Art of Multiprocessor Programming 42

Amdahl’s Law Speedup= …of computation given n CPUs instead of 1 Art of Multiprocessor

Amdahl’s Law Speedup= …of computation given n CPUs instead of 1 Art of Multiprocessor Programming 43

Amdahl’s Law Speedup= Art of Multiprocessor Programming 44

Amdahl’s Law Speedup= Art of Multiprocessor Programming 44

Amdahl’s Law Parallel fraction Speedup= Art of Multiprocessor Programming 45

Amdahl’s Law Parallel fraction Speedup= Art of Multiprocessor Programming 45

Amdahl’s Law Sequential fraction Parallel fraction Speedup= Art of Multiprocessor Programming 46

Amdahl’s Law Sequential fraction Parallel fraction Speedup= Art of Multiprocessor Programming 46

Amdahl’s Law Sequential fraction Parallel fraction Speedup= Number of processors Art of Multiprocessor Programming

Amdahl’s Law Sequential fraction Parallel fraction Speedup= Number of processors Art of Multiprocessor Programming 47

Example • Ten processors • 60% concurrent, 40% sequential • How close to 10

Example • Ten processors • 60% concurrent, 40% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 48

Example • Ten processors • 60% concurrent, 40% sequential • How close to 10

Example • Ten processors • 60% concurrent, 40% sequential • How close to 10 -fold speedup? Speedup=2. 17= Art of Multiprocessor Programming 49

Example • Ten processors • 80% concurrent, 20% sequential • How close to 10

Example • Ten processors • 80% concurrent, 20% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 50

Example • Ten processors • 80% concurrent, 20% sequential • How close to 10

Example • Ten processors • 80% concurrent, 20% sequential • How close to 10 -fold speedup? Speedup=3. 57= Art of Multiprocessor Programming 51

Example • Ten processors • 90% concurrent, 10% sequential • How close to 10

Example • Ten processors • 90% concurrent, 10% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 52

Example • Ten processors • 90% concurrent, 10% sequential • How close to 10

Example • Ten processors • 90% concurrent, 10% sequential • How close to 10 -fold speedup? Speedup=5. 26= Art of Multiprocessor Programming 53

Example • Ten processors • 99% concurrent, 01% sequential • How close to 10

Example • Ten processors • 99% concurrent, 01% sequential • How close to 10 -fold speedup? Art of Multiprocessor Programming 54

Example • Ten processors • 99% concurrent, 01% sequential • How close to 10

Example • Ten processors • 99% concurrent, 01% sequential • How close to 10 -fold speedup? Speedup=9. 17= Art of Multiprocessor Programming 55

The Moral • Making good use of our multiple processors (cores) means • Finding

The Moral • Making good use of our multiple processors (cores) means • Finding ways to effectively parallelize our code – Minimize sequential parts – Reduce idle time in which threads wait without Art of Multiprocessor Programming 56

Multicore Programming • This is what this course is about… – The % that

Multicore Programming • This is what this course is about… – The % that is not easy to make concurrent yet may have a large impact on overall speedup • Next week: – A more serious look at mutual exclusion Art of Multiprocessor Programming 57

 This work is licensed under a Creative Commons Attribution. Share. Alike 2. 5

This work is licensed under a Creative Commons Attribution. Share. Alike 2. 5 License. • You are free: – to Share — to copy, distribute and transmit the work – to Remix — to adapt the work • Under the following conditions: – Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). – Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. • For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – http: //creativecommons. org/licenses/by-sa/3. 0/. • Any of the above conditions can be waived if you get permission from the copyright holder. • Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming 58