Pipelined Computations Slides for Parallel Programming Techniques Applications

  • Slides: 38
Download presentation
Pipelined Computations Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel

Pipelined Computations Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 1

Pipelined Computations Problem divided into a series of tasks that have to be completed

Pipelined Computations Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each task executed by a separate process or processor. This parallelism can be viewed as a form of functional decomposition. The problem is divided into separate functions that must be performed, but in this case, the functions are performed in succession. The input data is often broken up and processed separately. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 2

Example Add all the elements of array a to an accumulating sum: for (i

Example Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4]; . . . Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 3

Pipeline for an unfolded loop Slides for Parallel Programming Techniques & Applications Using Networked

Pipeline for an unfolded loop Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 4

Another Example Frequency filter - Objective to remove specific frequencies (f 0, f 1,

Another Example Frequency filter - Objective to remove specific frequencies (f 0, f 1, f 2, f 3, etc. ) from a digitized signal, f(t). Signal enters pipeline from left: Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 5

Where pipelining can be used to good effect Assuming problem can be divided into

Where pipelining can be used to good effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1. If more than one instance of the complete problem is to be Executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start next process can be passed forward before process has completed all its internal operations Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 6

“Type 1” Pipeline Space-Time Diagram P processes in the pipeline and m instances, the

“Type 1” Pipeline Space-Time Diagram P processes in the pipeline and m instances, the total number of cycles is m+p-1. The average number of cycles is (m+p-1)/m, which tends to one cycle per problem. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 7

“Type 2” Pipeline Space-Time Diagram P processes in the pipeline and n data items,

“Type 2” Pipeline Space-Time Diagram P processes in the pipeline and n data items, the total number of cycles is n+p-1. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 9

“Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage before

“Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage before previous state completed. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 10

If the number of stages is larger than the number of processors in any

If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 11

Computing Platform for Pipelined Applications Multiprocessor system with a line configuration Pipeline processing requires

Computing Platform for Pipelined Applications Multiprocessor system with a line configuration Pipeline processing requires the ability to pass messages between adjacent processes. This suggests direct communication links between processors onto which adjacent processes are mapped. Ideally, a ring or line structure. Simultaneous transfer is provided by most switched interconnection. A single shared Ethernet connection can not do(use non-blocking send). Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 12

Example Pipelined Solutions (Examples of each type of computation) Slides for Parallel Programming Techniques

Example Pipelined Solutions (Examples of each type of computation) Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 13

Pipeline Program Examples Adding Numbers Type 1 pipeline computation Slides for Parallel Programming Techniques

Pipeline Program Examples Adding Numbers Type 1 pipeline computation Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 14

Basic code for process Pi : recv(&accumulation, Pi-1); accumulation = accumulation + number; send(&accumulation,

Basic code for process Pi : recv(&accumulation, Pi-1); accumulation = accumulation + number; send(&accumulation, Pi+1); except for the first process, P 0, which is send(&number, P 1); and the last process, Pn-1, which is recv(&number, Pn-2); accumulation = accumulation + number; Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 15

SPMD program if (process > 0) { recv(&accumulation, Pi-1); accumulation = accumulation + number;

SPMD program if (process > 0) { recv(&accumulation, Pi-1); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1); The final result is in the last process. Instead of addition, other arithmetic operations could be done. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 16

Pipelined addition numbers Master process and ring configuration Slides for Parallel Programming Techniques &

Pipelined addition numbers Master process and ring configuration Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 17

If we only have one set of n numbers, the complexity is O(n). If

If we only have one set of n numbers, the complexity is O(n). If we have m groups of n numbers to add, each group resulting in a separate answer. The average execution time is about one cycle for each group when m is large. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Sorting Numbers A parallel version of insertion sort. Slides for Parallel Programming Techniques &

Sorting Numbers A parallel version of insertion sort. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 18

Pipeline for sorting using insertion sort Type 2 pipeline computation Slides for Parallel Programming

Pipeline for sorting using insertion sort Type 2 pipeline computation Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 19

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 20

Insertion sort with results returned to master process using bidirectional line configuration Slides for

Insertion sort with results returned to master process using bidirectional line configuration Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 21

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Insertion sort with results returned Slides for Parallel Programming Techniques & Applications Using Networked

Insertion sort with results returned Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 22

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Prime Number Generation Sieve of Eratosthenes • Series of all integers generated from 2.

Prime Number Generation Sieve of Eratosthenes • Series of all integers generated from 2. • First number, 2, is prime and kept. • All multiples of this number deleted as they cannot be prime. • Process repeated with each remaining number. • The algorithm removes non-primes, leaving only primes. Type 2 pipeline computation Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 23

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Type 2 pipeline computation Slides for Parallel Programming Techniques & Applications Using Networked Workstations

Type 2 pipeline computation Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Analysis is similar to parallel sorting except that each process will complete fewer steps

Analysis is similar to parallel sorting except that each process will complete fewer steps than preceding process because it will not receive all numbers that preceding process receives. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Solving a System of Linear Equations Upper-triangular form where a’s and b’s are constants

Solving a System of Linear Equations Upper-triangular form where a’s and b’s are constants and x’s are unknowns to be found. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 25

Back Substitution First, unknown x 0 is found from last equation; i. e. ,

Back Substitution First, unknown x 0 is found from last equation; i. e. , Value obtained for x 0 substituted into next equation to obtain x 1; i. e. , Values obtained for x 1 and x 0 substituted into next equation to obtain x 2: and so on until all the unknowns are found. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 26

Pipeline Solution First pipeline stage computes x 0 and passes x 0 onto the

Pipeline Solution First pipeline stage computes x 0 and passes x 0 onto the second stage, which computes x 1 from x 0 and passes both x 0 and x 1 onto the next stage, which computes x 2 from x 0 and x 1, and so on. Type 3 pipeline computation Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 27

The ith process (0 < i < n) receives the values x 0, x

The ith process (0 < i < n) receives the values x 0, x 1, x 2, …, xi-1 and computes xi from the equation: Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 28

Sequential Code Given constants ai, j and bk stored in arrays a[ ][ ]

Sequential Code Given constants ai, j and bk stored in arrays a[ ][ ] and b[ ], respectively, and values for unknowns to be stored in array, x[ ], sequential code could be Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 29

Parallel Code Pseudocode of process Pi (1 < i < n) of could be

Parallel Code Pseudocode of process Pi (1 < i < n) of could be Now have additional computations to do after receiving and resending values. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 30

Pipeline processing using back substitution Slides for Parallel Programming Techniques & Applications Using Networked

Pipeline processing using back substitution Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5. 31

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel execution time will be given by the final process plus the p-1 send()

Parallel execution time will be given by the final process plus the p-1 send() plus one divide ( to compute x 0), so complexity is still O(n). Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2 nd ed. , by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.