COSC 513 Operating System Research Paper Fundamental Properties






















- Slides: 22
COSC 513 Operating System Research Paper Fundamental Properties of Programming for Parallelism Student: Feng Chen (134192)
Conditions of Parallelism l Needs in three key areas: Computation models l Inter-processor communication l System integration l l Tradeoffs exist among time, space, performance, cost factors
Data and resource dependences l l l Flow dependence: if an execution path exists from S 1 to S 2 and at least one output of S 1 feeds in as input to S 2 Antidependence: if S 2 follows S 1 and the output of S 2 overlaps the input to S 1 Output dependence: S 1 and S 2 produce the same output variable I/O dependence: the same file is referenced by more than one I/O statements Unknown dependence: index itself indexed (indirect addressing), no loop variable in the index, nonlinear loop index, etc.
Example of data dependence l l S 1: Load R 1, A S 2: Add R 2, R 1 S 3: Move R 1, R 3 S 4: Store B, R 1 /move mem(A) to R 1 /R 2 = (R 1) + (R 2) /move (R 3) to R 1 /move (R 1) to mem(B) l l l S 2 is flow-dependent on S 1 S 3 is antidependent on S 2 S 3 is output-dependent on S 1 S 2 and S 4 are totally independent S 4 is flow-dependent on S 1 and S 3
Example of I/O dependence S 1: Read(4), A(i) unit 4 l S 2: Rewind(4) l S 3: Write(4), B(i) unit 4 l S 4: Rewind(4) l l l /read array A from tape /rewind tape unit 4 /write array B into tape /rewind tape unit 4 S 1 and S 3 are I/O dependent on each other This relation should not be violated during execution; otherwise, errors occur.
Control dependence l The situation where the order of execution of statements cannot be determined before run time l Different paths taken after a conditional branch may change data dependences l May exist between operations performed in successive iterations of a loop l Control dependence often prohibits parallelism from being exploited
Example of control dependence l Successive iterations of this loop are controlindependent: l For (I=0; I<N; I++) { A(I) = C(I); if (A(I) < 0) A(I) = 1; } l l l
Example of control dependence l The following loop has control-dependent iterations: l For (I=1; I<N; I++) l{ if (A(I-1) == 0) l A(I) = 0 l} l
Resource dependence l Concerned with the conflicts in using shared resources, such as integer units, floating-point units, registers, and memory areas l l l ALU dependence: ALU is the conflicting resource Storage dependence: each task must work on independent storage locations or use protected access to share writable memory area Detection of parallelism requires a check of the various dependence relations
Bernstein’s conditions for parallelism l Define: l l l Ii as the input set of a process Pi Oi as the output set of a process Pi P 1 and P 2 can execute in parallel (denoted as P 1 || P 2) under the condition: l I 1 l I 2 l ∩ O 2 = 0 ∩ O 1 = 0 O 1 ∩ O 2 = 0 l Note that I 1 ∩ I 2 <> 0 does not prevent parallelism
Bernstein’s conditions for parallelism l Input set: also called read set or domain of a process l Output set: also called write set or range of a process l A set of processes can execute in parallel if Bernstein’s conditions are satisfied on a pairwise basis; that is, P 1||P 2|| … ||PK if and only if Pi||Pj for all i<>j
Bernstein’s conditions for parallelism l The parallelism relation is commutative: Pi || Pj implies that Pj || Pi l The relation is not transitive: Pi || Pj and Pj || Pk do not necessarily mean Pi || Pk l Associativity: Pi || Pj || Pk implies that (Pi || Pj) || Pk = Pi || (Pj || Pk)
Bernstein’s conditions for parallelism For n processes, there are 3 n(n-1)/2 conditions; violation of any of them prohibits parallelism collectively or partially l Statements or processes which depend on run -time conditions are not transformed to parallelism. (IF or conditional branches) l The analysis of dependences can be conducted at code, subroutine, process, task, and program levels; higher-level dependence can be inferred from that of subordinate levels l
Example of parallelism using Bernstein’s conditions l P 1: C=D*E l P 2: M = G + C l P 3: A = B + G l P 4: C = L + M l P 5: F = G / E l Assume no pipeline is used, five steps are needed in sequential execution
Example of parallelism using Bernstein’s conditions D P 1 E Time * D C G P 2 + P 1 G P 3 B L P 2 M + C G P 5 E / F * B C G E C + A P 4 E + L P 4 P 3 + P 5 / M + C A F
Example of parallelism using Bernstein’s conditions l There are 10 pairs of statements to check against Bernstein’s conditions l Only P 2 || P 3 || P 5 is possible because P 2 || P 3, P 3 || P 5 and P 2 || P 5 are all possible l If two adders are available simultaneously, the parallel execution requires only three steps
Implementation of parallelism l We need special hardware and software support to implement parallelism l There is a distinguish between hardware and software parallelism l Parallelism cannot be achieved free
Hardware parallelism Often a function of cost and performance tradeoffs l If a processor issues k instructions per machine cycle, it is called a k-issue processor l Conventional processor takes one or more machine cycles to issue a single instruction: one-issue processor l A multiprocessor system built with n k-issue processors should be able to handle maximum nk threads of instructions simultaneously l
Software parallelism Defined by the control and data dependence of programs l A function of algorithm, programming styles, and compiler optimization l Two most cited types of parallel programming: l l l Control parallelism: in the form of pipelining and multiple functional units Data parallelism: similar operations performed over many data elements by multiple processors; practiced in SIMD and MIMD systems
Hardware vs. Software parallelism l Totally eight instructions: 4 loads (L), 2 multiplication (X), 1 addition (+) and 1 subtraction (-) l Theoretically, the computation will be accomplished in 3 cycles (steps) l Step 1 L L Step 2 X X Step 3 + - A B
Hardware vs. Software parallelism Hardware parallelism (Example 1) l By a 2 -issue processor which can execute one memory access and one arithmetic operation simultaneously l The computation needs 7 cycles (steps) l Mismatch between HW and SW parallelism l Step 1 L Step 2 L Step 3 X X L Step 4 Step 5 X Step 6 + Step 7 L A B
Hardware vs. Software parallelism Hardware parallelism (example 2) l Using a dual-processor system, each processor is single-issue l 6 cycles are needed to execute the 12 instructions, where 2 store operations and 2 load operations are inserted for interprocessor communication through the shared memory l Step 1 L L Step 2 L L Step 3 X X Step 4 S S Step 5 L L Step 6 + - A S statements: added instructions for interprocessor communication B