Partitioning Divide and Conquer Partitioning Dividing the problem
Partitioning, Divide and Conquer • Partitioning – Dividing the problem into parts • Most strategies require coordination between the parts • Embarrassingly parallel is an exception – Partitioning can typically be done in two ways • Dividing the data – Data partitioning or domain decomposition • Dividing the program – Functional decomposition • Divide and Conquer – Dividing a problem into sub-problems that are of the same form as the original problem • Mandelbrot program • Integration 9/15/2020 Divide and Conquer Strategies 1
Parallel Programming Paradigms • Result Parallelism – Focuses on the result – Break the results into components and assign processes to work on each part of the result • Specialist Parallelism – Focuses on the ability of the “work crew” • Agenda Parallelism – Focuses on the list of tasks to be performed (http: //www. mcs. drexel. edu/~jjohnson/fa 02/cs 730/lectures/lec 1. ppt ) 9/15/2020 Divide and Conquer Strategies 2
Programming Methods • Live Data Structures – Build program in the shape of the data structure that will ultimately give the result. Each element of the data structure is a separate process – No messages exchanged, processes refer to each other. • Message Passing – Enclose every data structure within a process • Distributed Data Structures – Many processes share direct access to many other data objects. – Processes coordinate by leaving data in shared space (http: //www. mcs. drexel. edu/~jjohnson/fa 02/cs 730/lectures/lec 1. ppt ) 9/15/2020 Divide and Conquer Strategies 3
Good Parallel Programming Environments • Augment sequential programming language most appropriate for task • Support – Process creation – Interprocess communication – As natural extensions to base language • Portable • Easy to use (conceptually and in practice) 9/15/2020 Divide and Conquer Strategies 4
Issues for Portability • • • Broad spectrum of machines Computation/communication ratio differs dramatically among architectural classes Portable program may run poorly on another architecture, but can tweak later Same class machine does not mean same programming environment Works best for – Relatively coarse grain – Non-communication intensive 9/15/2020 Divide and Conquer Strategies 5
Most Models of Parallelism Assume Programs Parallelized By • Process parallelism – partitioning into large number of simultaneous activities • Data parallelism – partitioning data into large number of identical sets and then synchronously applying same program operation to each set 9/15/2020 Divide and Conquer Strategies 6
Pipeline • Processors are arranged in a pipeline (virtually) • Work is sent down the pipeline for processing • Full utilization of the processors does not occur until the pipe is full T 4 P 0 T 3 P 1 T 2 P 2 T 1 P 3 T 0 9/15/2020 Divide and Conquer Strategies 7
Matrix Multiplication • To make this discussion easier we will assume square matrices – The product of two n by n matrices A and B is given by – Note that all valid products are of the form 9/15/2020 Divide and Conquer Strategies 8
Dissection Time a 00 a 01 a 02 a 10 a 11 a 12 a 20 a 21 a 22 x b 00 b 01 b 02 b 10 b 11 b 12 b 20 b 21 b 22 = a 00*b 00+a 01*b 10+a 02*b 20 a 00*b 01+a 01*b 11+a 02*b 21 a 00*b 02+a 01*b 12+a 02*b 22 a 10*b 00+a 11*b 10+a 12*b 20 a 10*b 01+a 11*b 11+a 12*b 21 a 10*b 02+a 11*b 12+a 12*b 22 a 20*b 00+a 21*b 10+a 22*b 20 a 20*b 01+a 21*b 11+a 22*b 21 a 20*b 02+a 21*b 12+a 22*b 22 9/15/2020 Divide and Conquer Strategies 9
Parallelize • Organize the PE grid as a N x N cube • Place the data in the processors so that each computes a sum for one of the Cij’s so the multiplication can be done in one step • All that is left to sum the products 9/15/2020 Divide and Conquer Strategies 10
Parallelize n c um R u ed o ti a 02*b 20 a 12*b 20 a 22*b 20 S a 01*b 10 a 11*b 10 a 21*b 10 a 00*b 00 a 10*b 00 a 20*b 00 9/15/2020 a 00*b 01 a 10*b 01 a 20*b 01 a 01*b 11 a 11*b 11 a 21*b 11 a 02*b 21 a 12*b 21 a 22*b 21 a 02*b 22 a 12*b 22 a 22*b 22 a 01*b 12 a 11*b 12 a 21*b 12 a 00*b 02 a 10*b 02 a 20*b 02 Divide and Conquer Strategies 11
The Algorithm The algorithm for parallel matrix multiplication 1. 2. 3. 4. 9/15/2020 Load the arrays into the processors Everyone multiplies Do a REDUCE. SUM from back to front Result is in the front 3 x 3 plane of the cube Divide and Conquer Strategies 12
Using Fewer Processors b 20 b 10 b 00 a 02 a 01 a 00 a 12 a 11 a 10 a 22 a 21 a 20 9/15/2020 b 21 b 11 b 01 b 22 b 12 b 02 * * * * * Divide and Conquer Strategies 13
Using Fewer Processors a 02 a 01 a 12 a 11 a 10 a 22 a 21 a 20 9/15/2020 b 22 b 12 b 02 b 20 b 10 b 21 b 11 b 01 a 00* b 00 * * * * Divide and Conquer Strategies 14
Using Fewer Processors b 20 b 21 b 11 b 22 b 12 b 02 a 01* b 10 a 00* b 01 * a 12 a 11 a 10* b 00 * * * a 22 a 21 a 20 9/15/2020 Divide and Conquer Strategies 15
Using Fewer Processors 9/15/2020 b 21 b 22 b 12 a 02* b 20 a 01* b 11 a 00* b 02 a 11* b 10 a 10* b 01 * a 22 a 21 a 20* b 00 * * Divide and Conquer Strategies 16
Using Fewer Processors b 22 a 22 9/15/2020 * a 01* b 21 a 00* b 12 a 12* b 10 a 11* b 11 a 10* b 02 a 21* b 00 a 20* b 01 * Divide and Conquer Strategies 17
Improving Efficiency a 00*b 00+a 01*b 10+a 02*b 20 a 10*b 00+a 11*b 10+a 12*b 20 a 20*b 00+a 21*b 10+a 22*b 20 9/15/2020 a 00*b 01+a 01*b 11+a 02*b 21 a 10*b 01+a 11*b 11+a 12*b 21 a 20*b 01+a 21*b 11+a 22*b 21 a 00*b 02+a 01*b 12+a 02*b 22 a 10*b 02+a 11*b 12+a 12*b 22 a 20*b 02+a 21*b 12+a 22*b 22 a 00* b 00 a 01* b 11 a 02* b 22 a 11* b 10 a 12* b 21 a 10* b 02 a 22* b 20 a 20* b 01 a 21* b 12 Divide and Conquer Strategies 18
Improving Efficiency 9/15/2020 a 00* b 00 a 01* b 11 a 02* b 22 a 11* b 10 a 12* b 21 a 10* b 02 a 22* b 20 a 20* b 01 a 21* b 12 a 02* b 20 a 00* b 01 a 01* b 12 a 10* b 00 a 11* b 11 a 12* b 22 a 21* b 10 a 22* b 21 a 20* b 02 a 01* b 10 a 02* b 21 a 00* b 02 a 12* b 20 a 10* b 01 a 11* b 12 a 20* b 00 a 21* b 11 a 22* b 22 Divide and Conquer Strategies 19
Farmer/Worker • One way to do data partitioning • Farmer prepares tasks for workers • Workers receive task and do the work • Work is sent back to farmer • Farmer consolidates results P 0 P 3 P 1 P 4 P 5 P 2 Farmer 9/15/2020 Divide and Conquer Strategies 20
Linda • Linda is a memory model – A model represents a particular way of thinking about problems • Every process has access to a shared pool of memory referred to as tuple space – Data tuples – Process tuples • Processes coordinate by generating, reading, and consuming tuples 9/15/2020 Divide and Conquer Strategies 21
David Gelernter • Linda was developed by David Gelernter, a CS professor at Yale When it came time to name the language, Mr Gelernter said he noted that Ada was named after Ada Augusta Lovelace, the daughter of Lord Byron, the English poet. Miss Lovelace is regarded as the first computer programmer because she worked for the computer pioneer Charles Babbage. Another woman named Lovelace was in the news when Mr Gelernter was casting about for a name -- Linda Lovelace, a star of pornographic films. So he named the language Linda, and it stuck. Asked about it now, Mr Gelernter grins and shrugs, "I was a graduate student at the time, " he said. 9/15/2020 Divide and Conquer Strategies 22
David Gelernter • David Hillel Gelernter is a professor of computer science at Yale University. In the 1980 s, he made seminal contributions to the field of parallel computation, specifically the tuple space model of coordination and the Linda Programming System. He received his Bachelor of Arts degree from Yale University in 1976, and his Ph. D. from the State University of New York, Stony Brook in 1982. In 1993, he was critically injured opening a mailbomb sent by Theodore Kaczynski, who at that time was an unidentified but violent opponent of technological progress, dubbed by the press as "The Unabomber". He recovered from his injuries, while sustaining permanent damage to his right hand eye; chronicling the ordeal in his 1997 book Drawing Life: Surviving the Unabomber. He was nominated to and subsequently became a member of The National Council on the Arts. His biographical summary can be found at the National Endowment for the Arts web site (http: //www. nea. gov/about/NCA/Gelernter. html) 9/15/2020 Divide and Conquer Strategies 23
Linda Goals • • High level language for explicit parallel programming Portability No temporal or spatial relationships between parallel processes Dynamic distribution of tasks at runtime supporting – Dynamic process creation – Static allocation 9/15/2020 Divide and Conquer Strategies 24
Linda – A Memory Model • Tuple Space – Logically shared associative memory – Collection of logically ordered sets of data (tuples) – Accomplish work by generating, using, consuming data tuples • Process tuples – Under active evaluation – When done, become data tuple • Data Tuples – Passive 9/15/2020 Divide and Conquer Strategies 25
Tuple Space Sender Tuple Space Receiver 9/15/2020 Divide and Conquer Strategies 26
Linda – A Programming Model • Linda: – Smart optimizing pre-compiler – Run-time kernel • • Shown to work well with shared memory Suggested will work on distributed memory – – Brenda (Trollius/Cornell) University of MN (Transputer) Cogent Research (OS Model) Laden (RIT) 9/15/2020 Divide and Conquer Strategies 27
Characteristics of the Linda Model • • • Processes are decoupled Process create, look at, destroy data objects Will wait, if try to read non-existent object (dead lock possible!) Objects stored in a shared space accessible to all processes Objects identified by content rather than location 9/15/2020 Divide and Conquer Strategies 28
Linda Programming Paradigm • Distributed data structures accessible to many processes simultaneously • Processes accessing data structures simultaneously • Any data structure in tuple space is accessible to any process in that same tuple space • Linda processes aspire to know as little about each other as possible 9/15/2020 Divide and Conquer Strategies 29
Linda Operations • in/inp – input from tuple space (wait/no wait) (tuple removed in tuple space) • rd/rdp – read from tuple space (wait/no wait) (tuple remains in tuple space) • out – evaluate and then output to tuple space • eval – output to tuple space and then evaluate as series of processes 9/15/2020 Divide and Conquer Strategies 30
Linda Operations: out • out( t ) – new tuple t to be evaluated and then put into tuple space • t – sequence of typed values • Examples: ("a string", 12. 96, 16, y) ( 0, 1 ) 9/15/2020 Divide and Conquer Strategies 31
Linda Operations: in • • in( s ) – causes some tuple t to be withdrawn from tuple space t – chosen arbitrarily from those that match s. s – anti-tuple – sequence of typed fields that may be actual values or formal place holders. t matches s if – Same number of fields – Types of fields match pairwise – Actual values in s matches values of corresponding field in t 9/15/2020 Divide and Conquer Strategies 32
Linda Operations: in • If s matches t then – Actual value in t assigned to formal place holder in s – Evoking process continues then continues to execute – If no match, evoking process waits until there is one • Field types – – – [unsigned] int, long, short, char Float, double Struct Union [] of arbitrary dimensions of above 9/15/2020 Divide and Conquer Strategies 33
Tuple Matching • in("a string", ? f, ? i, y ) – execution searches for passive data tuple having – – First element that is "a string" Second element that has the same type as variable f Third element has same type as variable i Fourth element has same value as variable y • Result: Get values for f and i 9/15/2020 Divide and Conquer Strategies 34
Linda Operations: inp • inp( s ) - same as in, except – – No wait Returns 1, if succeeds Returns 0, if fails May be inefficient depending on implementation 9/15/2020 Divide and Conquer Strategies 35
Linda Operations: rd/rdp • rd(s)/rdp( s ) - same as in/inp, except – tuple is read only, not removed from tuple space 9/15/2020 Divide and Conquer Strategies 36
Linda Operations: eval • eval( t ) - Similar to out except – Tuple is evaluated AFTER being placed in tuple space – New process is created to evaluate each field of t – When all fields completely evaluated, t becomes passive data tuple 9/15/2020 Divide and Conquer Strategies 37
Linda Operations: eval • • Example: eval("e", 7, exp( 7 ) ) Creates 3 element live tuple and returns immediately – Generates 3 processes: • Fist computes “e” • Second computes 7 • Third computes exp(7) – When all done, live tuple replaced by data tuple containing: ("e", 7, 1096. 63… ) • Can be read with: rd("e", 7, ? value ) 9/15/2020 Divide and Conquer Strategies 38
Linda Operations: eval • Comparison with out for (i=0, i<100; i++) out("square roots", i, sqrt( i ) ); for (i=0, i<100; i++) eval("square roots", i, sqrt( i ) ); • Values are inherited only for explicitly referenced names, e. g. , eval("Q", f( x, y ) ); Any static local or global variables in f are NOT initialized! 9/15/2020 Divide and Conquer Strategies 39
To Build A Linda Program • Drop 1 process into tuple space • It creates other process tuples • Process tuples execute in parallel, exchange data by – – Generating data tuples Reading data tuples Consuming data tuples When finished, become data tuple 9/15/2020 Divide and Conquer Strategies 40
Programming Example – Parallel Hello World – chello. cl #include <stdio. h> #include <unistd. h> #define NPROC 8 int real_main() { int i, hello(); out("count", 0); for (i = 0; i < NPROC; ++i) eval("hello_world", hello(i)); in("count", NPROC); for (i = 0; i < NPROC; ++i) in("hello_world", ? int); printf("All processors donen"); return 0; } 9/15/2020 int hello(int id) { int j; char h[100]; if (gethostname(h, sizeof(h)) != 0) { fprintf(stderr, "Problem in gethostname()n"); lexit(1); } printf("Hello World from node %s, virtual proc no: %dn", h, id); in("count", ? j); out("count", j+1); return 0; } Divide and Conquer Strategies 41
Matrix Multiplication • Master –Initializes/cleans-up (real_main) – – • Dumps rows of A and columns of B into tuple space Specifies first element to be computed Handles assembly of data Handles termination Workers – – – Find out what to compute Specifies what to be computed next Gets appropriate row and column data Computes element Outputs computed element 9/15/2020 Divide and Conquer Strategies 42
Matrix Multiplication • Questions: – – How many workers? What tuples do we need? How should we indicate termination? Are there any performance issues? 9/15/2020 Divide and Conquer Strategies 43
int real_main( argc, argv ) int argc; char **argv; { int dim, /* Actual dimension of matrix */ workers; /* the number of workers */ real_main if ( argc != 3 ) { printf( "Usage: %s <workers> <dim>n", *argv ); lexit( 1 ); } workers = atol( *++argv ); dim = atol( *++argv ); printf( "matrix -- workers: %d, dim: %dn", workers, dim ); master( workers, dim ); return 0; } 9/15/2020 Divide and Conquer Strategies 44
master. 1 void master( workers, dim ) int dim, workers; { int A[MAXARRAYSIZE], B[MAXARRAYSIZE], col_index, result[MAXARRAYSIZE], retriever, /* variable to temporarily hold the value read from tuplespace */ row_index, *row, /* pointer to a single row in A */ *col; /* pointer to a single col in B */ /* * Initialize the two matrices - A by row , B by col * and print them */. . . 9/15/2020 Divide and Conquer Strategies 45
/* Start the C-linda timer utility */ master. 2 start_timer(); /* Put the matrices in the tuple space */ for ( index = 0; index < dim; ++index ) { row = &A[ index ][ 0 ]; col = &B[ index ][ 0 ]; out( "A-row", index, row: dim ); out( "B-col", index, col: dim ); } /* Make a timer split */ timer_split( "done setting up" ); 9/15/2020 Divide and Conquer Strategies 46
master. 3 /* Start workers */ for ( index = 0; index < workers; ++index ) { eval( "worker", worker( index, dim ) ); } /* Indicate element to work on */ out( "NEXT", 0 ); /* Retrieve each element of the product matrix*/ for ( index = 0; index < dim*dim; ++index ) { in( "Result", ? row_index, ? col_index, ? retriever ); result[ row_index ][ col_index ] = retriever; } 9/15/2020 Divide and Conquer Strategies 47
master. 4 /* * Write out results */. . . /* Complete and print timing */ timer_split( "all done" ); print_times( ); } 9/15/2020 Divide and Conquer Strategies 48
int worker( i, dim ) int i, dim; { int col[ MAXARRAYSIZE ], col_index, next_index, row[ MAXARRAYSIZE ], row_index, result, *cp, *rp; worker. 1 /* element in the column matrix */ /* element in the row matrix */ while( TRUE ) { 9/15/2020 Divide and Conquer Strategies 49
/* Get index of row of product matrix to compute */ in( "NEXT", ? index ); /* If no more work, indicate termination and stop */ if ( index < 0 ) { out( "NEXT", -1 ); return( 0 ); } else if ( index < dim * dim ) { /* Indicate the next node in the list */ next_index = index + 1; out( "NEXT", next_index ); } else { /* Put out a termination tuple */ out( "NEXT", -1 ); return( 0 ); } 9/15/2020 Divide and Conquer Strategies worker. 2 Managing work to do tuple 50
/* Which row and column indices are we doing? */ row_index = index / dim; col_index = index % dim; /* Read row and column we are interested in */ rd( "A-row", row_index, ? row: dim ); rd( "B-col", col_index, ? col: dim ); worker. 3 /* Compute the appropriate element */ /* Initialize the variables for the dot product */ result = 0; rp = row; cp = col; /* Compute the dot product */ for ( index = 0; index < dim; ++index, ++rp, ++cp ) result += *rp * *cp; 9/15/2020 Divide and Conquer Strategies 51
worker. 4 /* Store the result element in the tuple space. */ out( "Result", row_index, col_index, result ); }/* End while (true) */ } /* End worker */ 9/15/2020 Divide and Conquer Strategies 52
Implementation • The definition of Linda is pretty simple, the tricky part is the implementation • Some Issues – How to find tuples? – Where to keep tuples? – Naming • Interesting project 9/15/2020 Divide and Conquer Strategies 53
Wator Simulation • "Wator" is a simple predator-prey simulation. – A. K. Dewdney, "Computer Recreations“, December, 1984, Scientific American. • There are sharks, fish, and water. – Sharks move, eat fish, and reproduce; they might starve to death. – Fish move and reproduce; they never starve, but might get eaten. – Neither fish nor sharks die of old age. • http: //www. cheesygames. com/wator. php 9/15/2020 Divide and Conquer Strategies 54
Wator Parameters • This simulation requires the following parameters: – – – Size of the ocean. Initial number of fish. Fish gestation period. Initial number of sharks. Shark gestation period. Shark starvation period. 9/15/2020 Divide and Conquer Strategies 55
Wator World • The ocean is an Nx. N array (the size N is an input parameter). It "wraps" to form a torus: – a cell on the right edge is adjacent to cells on the left edge, and a cell on the bottom edge is adjacent to cells on the top edge. • • A location in the array can be empty, or it can hold one fish or one shark (but not both). At the beginning of the simulation, the fish and sharks are placed in random locations in the ocean. 9/15/2020 Divide and Conquer Strategies 56
What a Fish Does • At each step, each fish will – Pick a random direction (one of four directions), and try to move in that direction. The fish can move only if the new square is empty. – If it is time for the fish to reproduce, and the fish was able to move, create a new fish in the just vacated square. Both the old and the new fish begin a new gestation period. – If it is time for the fish to reproduce, but the fish could not move, the fish does not move but remains ready to reproduce at the earliest opportunity. 9/15/2020 Divide and Conquer Strategies 57
What a Shark Does • At each step, each shark will – Check whether it is adjacent to a fish, and if so, move in that direction (and eat the fish). – Otherwise, it picks a random direction, and tries to move in that direction. – Reproduce according to the same rules as a fish (if it is time and the shark can move). – If the shark has not eaten for the time specified, it starves to death (and disappears). 9/15/2020 Divide and Conquer Strategies 58
Sequential Version • Ocean represented as an array – Create structure to hold fish information • Set of loops that runs over the array 9/15/2020 Divide and Conquer Strategies 59
Parallel Version • Ideas? 9/15/2020 Divide and Conquer Strategies 60
Parallel Version • Distribute ocean array across processors • Issues? 9/15/2020 Divide and Conquer Strategies 61
Boundaries • One issue is how does a processor know if a location on an adjacent processor is empty? – At beginning of update each processor could send its boundaries to its neighbors • Does this solve all problems? 9/15/2020 Divide and Conquer Strategies 62
Collisions • The distributed approach can lead to collisions – Two processors try to move a fish/shark into the same spot – Happens because boundaries are only exchanged at the beginning of the update 9/15/2020 Divide and Conquer Strategies 63
Handling Collisions • Rollback – Send the fish back from where it came – What happens when it is returned? • Same spot? • Different spot? • Kill one of the fish – Easy but clearly not correct – Might be the easiest thing to do 9/15/2020 Divide and Conquer Strategies 64
Any Other Issues? • Any other issues with the parallel version? 9/15/2020 Divide and Conquer Strategies 65
Load Balancing • Most of the processors will be iterating over empty ocean • Perhaps instead of distributing the ocean, we should distribute the fish • There may be a need to rebalance the fish as the simulation proceeds 9/15/2020 Divide and Conquer Strategies 66
- Slides: 66