ECE 1747 H Parallel Programming Lecture 1 2
- Slides: 55
ECE 1747 H : Parallel Programming Lecture 1 -2: Overview
ECE 1747 H • Meeting time: Friday 3 -5 PM • Instructor: Cristiana Amza, Associate Prof http: //www. eecg. toronto. edu/~amza@eecg. toronto. edu, office BA 4142 • TA: Arnamoy Bhattacharyya arnamoyb@ece. utoronto. ca
Material • Course notes • Web material (e. g. , published papers) • No required textbook, some recommended
Prerequisites • • • Programming in C or C++ Data structures Basics of machine architecture Basics of network programming Please send e-mail to ecehelp@ece. toronto. edu to get an eecg account !! (name, stuid, class, instructor)
Other than that • No written homeworks, no exams • 10% for each small programming assignments (expect 1) • 10% class participation • Rest comes from major course project
Programming Project • Parallelizing a sequential program, or improving the performance or the functionality of a parallel program • Project proposal and final report • In-class project proposal and final report presentation • “Sample” project presentation can be posted
Parallelism (1 of 2) • Ability to execute different parts of a single program concurrently on different machines • Goal: shorter running time • Grain of parallelism: how big are the parts? • Can be instruction, statement, procedure, … • Will mainly focus on relative coarse grain
Parallelism (2 of 2) • Coarse-grain parallelism mainly applicable to long-running, scientific programs • Examples: weather prediction, prime number factorization, simulations, …
Lecture material (1 of 4) • Parallelism – What is parallelism? – What can be parallelized? – Inhibitors of parallelism: dependences
Lecture material (2 of 4) • Standard models of parallelism – shared memory (Pthreads) – message passing (MPI) – shared memory + data parallelism (Open. MP) • Classes of applications – scientific – servers
Lecture material (3 of 4) • Transaction processing – classic programming model for databases – now being proposed for scientific programs
Lecture material (4 of 4) • Perf. of parallel & distributed programs – architecture-independent optimization – architecture-dependent optimization
Course Organization • First 2 -3 weeks of semester: – lectures on parallelism, patterns, models – small programming assignment done in teams of 2 or 3 • Rest of the semester – major programming project, done individually or in small group – Research paper discussions
Parallel vs. Distributed Programming Parallel programming has matured: • Few standard programming models • Few common machine architectures • Portability between models and architectures
Bottom Line • Programmer can now focus on program and use suitable programming model • Reasonable hope of portability • Problem: much performance optimization is still platform-dependent – Performance portability is a problem
ECE 1747 H: Parallel Programming Lecture 1 -2: Parallelism, Dependences
Parallelism • Ability to execute different parts of a program concurrently on different machines • Goal: shorten execution time
Measures of Performance • To computer scientists: speedup, execution time. • To applications people: size of problem, accuracy of solution, etc.
Speedup of Algorithm • Speedup of algorithm = sequential execution time / execution time on p processors (with the same data set). speedup p
Speedup on Problem • Speedup on problem = sequential execution time of best known sequential algorithm / execution time on p processors. • A more honest measure of performance. • Avoids picking an easily parallelizable algorithm with poor sequential execution time.
What Speedups Can You Get? • Linear speedup – Confusing term: implicitly means a 1 -to-1 speedup per processor. – (almost always) as good as you can do. • Sub-linear speedup: more normal due to overhead of startup, synchronization, communication, etc.
Speedup speedup linear actual p
Scalability • No really precise decision. • Roughly speaking, a program is said to scale to a certain number of processors p, if going from p-1 to p processors results in some acceptable improvement in speedup (for instance, an increase of 0. 5).
Super-linear Speedup? • Due to cache/memory effects: – Subparts fit into cache/memory of each node. – Whole problem does not fit in cache/memory of a single node. • Nondeterminism in search problems. – One thread finds near-optimal solution very quickly => leads to drastic pruning of search space.
Cardinal Performance Rule • Don’t leave (too) much of your code sequential!
Amdahl’s Law • If 1/s of the program is sequential, then you can never get a speedup better than s. – (Normalized) sequential execution time = 1/s + (1 - 1/s) = 1 – Best parallel execution time on p processors = 1/s + (1 - 1/s) /p – When p goes to infinity, parallel execution = 1/s – Speedup = s.
Why keep something sequential? • Some parts of the program are not parallelizable (because of dependences) • Some parts may be parallelizable, but the overhead dwarfs the increased speedup.
When can two statements execute in parallel? • On one processor: statement 1; statement 2; • On two processors: processor 1: statement 1; processor 2: statement 2;
Fundamental Assumption • Processors execute independently: no control over order of execution between processors
When can 2 statements execute in parallel? • Possibility 1 Processor 1: Processor 2: statement 1; statement 2; • Possibility 2 Processor 1: Processor 2: statement 2: statement 1;
When can 2 statements execute in parallel? • Their order of execution must not matter! • In other words, statement 1; statement 2; must be equivalent to statement 2; statement 1;
Example 1 a = 1; b = a; • Statements cannot be executed in parallel • Program modifications may make it possible.
Example 2 a = f(x); b = a; • May not be wise to change the program (sequential execution would take longer).
Example 3 a = 1; a = 2; • Statements cannot be executed in parallel.
True dependence Statements S 1, S 2 has a true dependence on S 1 iff S 2 reads a value written by S 1
Anti-dependence Statements S 1, S 2 has an anti-dependence on S 1 iff S 2 writes a value read by S 1.
Output Dependence Statements S 1, S 2 has an output dependence on S 1 iff S 2 writes a variable written by S 1.
When can 2 statements execute in parallel? S 1 and S 2 can execute in parallel iff there are no dependences between S 1 and S 2 – true dependences – anti-dependences – output dependences Some dependences can be removed.
Example 4 • Most parallelism occurs in loops. for(i=0; i<100; i++) a[i] = i; • No dependences. • Iterations can be executed in parallel.
Example 5 for(i=0; i<100; i++) { a[i] = i; b[i] = 2*i; } Iterations and statements can be executed in parallel.
Example 6 for(i=0; i<100; i++) a[i] = i; for(i=0; i<100; i++) b[i] = 2*i; Iterations and loops can be executed in parallel.
Example 7 for(i=0; i<100; i++) a[i] = a[i] + 100; • There is a dependence … on itself! • Loop is still parallelizable.
Example 8 for( i=0; i<100; i++ ) a[i] = f(a[i-1]); • Dependence between a[i] and a[i-1]. • Loop iterations are not parallelizable.
Loop-carried dependence • A loop carried dependence is a dependence that is present only if the statements are part of the execution of a loop. • Otherwise, we call it a loop-independent dependence. • Loop-carried dependences prevent loop iteration parallelization.
Example 9 for(i=0; i<100; i++ ) for(j=0; j<100; j++ ) a[i][j] = f(a[i][j-1]); • Loop-independent dependence on i. • Loop-carried dependence on j. • Outer loop can be parallelized, inner loop cannot.
Example 10 for( j=0; j<100; j++ ) for( i=0; i<100; i++ ) a[i][j] = f(a[i][j-1]); • Inner loop can be parallelized, outer loop cannot. • Less desirable situation. • Loop interchange is sometimes possible.
Level of loop-carried dependence • Is the nesting depth of the loop that carries the dependence. • Indicates which loops can be parallelized.
Be careful … Example 11 printf(“a”); printf(“b”); Statements have a hidden output dependence due to the output stream.
Be careful … Example 12 a = f(x); b = g(x); Statements could have a hidden dependence if f and g update the same variable. Also depends on what f and g can do to x.
Be careful … Example 13 for(i=0; i<100; i++) a[i+10] = f(a[i]); • • Dependence between a[10], a[20], … Dependence between a[11], a[21], … … Some parallel execution is possible.
Be careful … Example 14 for( i=1; i<100; i++ ) { a[i] = …; . . . = a[i-1]; } • Dependence between a[i] and a[i-1] • Complete parallel execution impossible • Pipelined parallel execution possible
Be careful … Example 15 for( i=0; i<100; i++ ) a[i] = f(a[indexa[i]]); • Cannot tell for sure. • Parallelization depends on user knowledge of values in indexa[]. • User can tell, compiler cannot.
Optimizations: Example 16 for (i = 0; i < 100000; i++) a[i + 1000] = a[i] + 1; Cannot be parallelized as is. May be parallelized by applying certain code transformations.
An aside • Parallelizing compilers analyze program dependences to decide parallelization. • In parallelization by hand, user does the same analysis. • Compiler more convenient and more correct • User more powerful, can analyze more patterns.
To remember • • Statement order must not matter. Statements must not have dependences. Some dependences can be removed. Some dependences may not be obvious.
- 1747-scnr
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- C programming lecture
- Parallel and distributed computing lecture notes
- Perbedaan linear programming dan integer programming
- Greedy vs dynamic
- Definition of system programming
- Linear vs integer programming
- Perbedaan linear programming dan integer programming
- Programming massively parallel processors
- Parallel programming patterns
- Parallel programming java
- An introduction to parallel programming peter pacheco
- Bubble sort mpi
- Mpi critical section
- Programming massively parallel processors
- Programming massively parallel processors
- Parallel programming platforms
- F# parallel programming
- Parallel programming
- Programming massively parallel processors, kirk et al.
- Resultant of parallel forces example problems
- The inner terminus of the finger print pattern
- Parallelism refers to
- Paralell structure
- Parallel and non parallel structure
- Serial in serial out shift register truth table
- Why is parallelism important
- Project procurement management lecture notes
- Lecture about sport
- Healthy lifestyle wrap up lecture
- Existential nihilism
- Life lecture meaning
- Randy pausch the last lecture summary
- Tensorflow lecture
- Theology proper lecture notes
- Strategic management lecture
- Geology lecture series
- Social psychology lecture
- In text citation for a lecture
- Advantages of government accounting
- Software project management lecture notes
- Eurocode reinforcement detailing
- Magnetism
- Physics 111 lecture notes
- What is a harmonic wave in physics
- Physical science lecture notes
- Power system dynamics and stability lecture notes
- Natural language processing
- Microbial physiology notes
- Mechatronics notes
- Limits fits and tolerances
- Les types de lecture au primaire
- Instruction de lecture et d'écriture
- Lecture carte aéronautique
- Lecture title