Partitioning Divide and Conquer Partitioning Dividing the problem

Partitioning, Divide and Conquer • Partitioning – Dividing the problem into parts • Most strategies require coordination between the parts • Embarrassingly parallel is an exception – Partitioning can typically be done in two ways • Dividing the data – Data partitioning or domain decomposition • Dividing the program – Functional decomposition • Divide and Conquer – Dividing a problem into sub-problems that are of the same form as the original problem • Mandelbrot program • Integration 9/15/2020 Divide and Conquer Strategies 1

Parallel Programming Paradigms • Result Parallelism – Focuses on the result – Break the results into components and assign processes to work on each part of the result • Specialist Parallelism – Focuses on the ability of the “work crew” • Agenda Parallelism – Focuses on the list of tasks to be performed (http: //www. mcs. drexel. edu/~jjohnson/fa 02/cs 730/lectures/lec 1. ppt ) 9/15/2020 Divide and Conquer Strategies 2

Programming Methods • Live Data Structures – Build program in the shape of the data structure that will ultimately give the result. Each element of the data structure is a separate process – No messages exchanged, processes refer to each other. • Message Passing – Enclose every data structure within a process • Distributed Data Structures – Many processes share direct access to many other data objects. – Processes coordinate by leaving data in shared space (http: //www. mcs. drexel. edu/~jjohnson/fa 02/cs 730/lectures/lec 1. ppt ) 9/15/2020 Divide and Conquer Strategies 3

Good Parallel Programming Environments • Augment sequential programming language most appropriate for task • Support – Process creation – Interprocess communication – As natural extensions to base language • Portable • Easy to use (conceptually and in practice) 9/15/2020 Divide and Conquer Strategies 4

Issues for Portability • • • Broad spectrum of machines Computation/communication ratio differs dramatically among architectural classes Portable program may run poorly on another architecture, but can tweak later Same class machine does not mean same programming environment Works best for – Relatively coarse grain – Non-communication intensive 9/15/2020 Divide and Conquer Strategies 5

Most Models of Parallelism Assume Programs Parallelized By • Process parallelism – partitioning into large number of simultaneous activities • Data parallelism – partitioning data into large number of identical sets and then synchronously applying same program operation to each set 9/15/2020 Divide and Conquer Strategies 6

Pipeline • Processors are arranged in a pipeline (virtually) • Work is sent down the pipeline for processing • Full utilization of the processors does not occur until the pipe is full T 4 P 0 T 3 P 1 T 2 P 2 T 1 P 3 T 0 9/15/2020 Divide and Conquer Strategies 7

Matrix Multiplication • To make this discussion easier we will assume square matrices – The product of two n by n matrices A and B is given by – Note that all valid products are of the form 9/15/2020 Divide and Conquer Strategies 8

Dissection Time a 00 a 01 a 02 a 10 a 11 a 12 a 20 a 21 a 22 x b 00 b 01 b 02 b 10 b 11 b 12 b 20 b 21 b 22 = a 00*b 00+a 01*b 10+a 02*b 20 a 00*b 01+a 01*b 11+a 02*b 21 a 00*b 02+a 01*b 12+a 02*b 22 a 10*b 00+a 11*b 10+a 12*b 20 a 10*b 01+a 11*b 11+a 12*b 21 a 10*b 02+a 11*b 12+a 12*b 22 a 20*b 00+a 21*b 10+a 22*b 20 a 20*b 01+a 21*b 11+a 22*b 21 a 20*b 02+a 21*b 12+a 22*b 22 9/15/2020 Divide and Conquer Strategies 9

Parallelize • Organize the PE grid as a N x N cube • Place the data in the processors so that each computes a sum for one of the Cij’s so the multiplication can be done in one step • All that is left to sum the products 9/15/2020 Divide and Conquer Strategies 10

Parallelize n c um R u ed o ti a 02*b 20 a 12*b 20 a 22*b 20 S a 01*b 10 a 11*b 10 a 21*b 10 a 00*b 00 a 10*b 00 a 20*b 00 9/15/2020 a 00*b 01 a 10*b 01 a 20*b 01 a 01*b 11 a 11*b 11 a 21*b 11 a 02*b 21 a 12*b 21 a 22*b 21 a 02*b 22 a 12*b 22 a 22*b 22 a 01*b 12 a 11*b 12 a 21*b 12 a 00*b 02 a 10*b 02 a 20*b 02 Divide and Conquer Strategies 11

The Algorithm The algorithm for parallel matrix multiplication 1. 2. 3. 4. 9/15/2020 Load the arrays into the processors Everyone multiplies Do a REDUCE. SUM from back to front Result is in the front 3 x 3 plane of the cube Divide and Conquer Strategies 12

Using Fewer Processors b 20 b 10 b 00 a 02 a 01 a 00 a 12 a 11 a 10 a 22 a 21 a 20 9/15/2020 b 21 b 11 b 01 b 22 b 12 b 02 * * * * * Divide and Conquer Strategies 13

Using Fewer Processors a 02 a 01 a 12 a 11 a 10 a 22 a 21 a 20 9/15/2020 b 22 b 12 b 02 b 20 b 10 b 21 b 11 b 01 a 00* b 00 * * * * Divide and Conquer Strategies 14

Using Fewer Processors b 20 b 21 b 11 b 22 b 12 b 02 a 01* b 10 a 00* b 01 * a 12 a 11 a 10* b 00 * * * a 22 a 21 a 20 9/15/2020 Divide and Conquer Strategies 15

Using Fewer Processors 9/15/2020 b 21 b 22 b 12 a 02* b 20 a 01* b 11 a 00* b 02 a 11* b 10 a 10* b 01 * a 22 a 21 a 20* b 00 * * Divide and Conquer Strategies 16

Using Fewer Processors b 22 a 22 9/15/2020 * a 01* b 21 a 00* b 12 a 12* b 10 a 11* b 11 a 10* b 02 a 21* b 00 a 20* b 01 * Divide and Conquer Strategies 17

Improving Efficiency a 00*b 00+a 01*b 10+a 02*b 20 a 10*b 00+a 11*b 10+a 12*b 20 a 20*b 00+a 21*b 10+a 22*b 20 9/15/2020 a 00*b 01+a 01*b 11+a 02*b 21 a 10*b 01+a 11*b 11+a 12*b 21 a 20*b 01+a 21*b 11+a 22*b 21 a 00*b 02+a 01*b 12+a 02*b 22 a 10*b 02+a 11*b 12+a 12*b 22 a 20*b 02+a 21*b 12+a 22*b 22 a 00* b 00 a 01* b 11 a 02* b 22 a 11* b 10 a 12* b 21 a 10* b 02 a 22* b 20 a 20* b 01 a 21* b 12 Divide and Conquer Strategies 18

Improving Efficiency 9/15/2020 a 00* b 00 a 01* b 11 a 02* b 22 a 11* b 10 a 12* b 21 a 10* b 02 a 22* b 20 a 20* b 01 a 21* b 12 a 02* b 20 a 00* b 01 a 01* b 12 a 10* b 00 a 11* b 11 a 12* b 22 a 21* b 10 a 22* b 21 a 20* b 02 a 01* b 10 a 02* b 21 a 00* b 02 a 12* b 20 a 10* b 01 a 11* b 12 a 20* b 00 a 21* b 11 a 22* b 22 Divide and Conquer Strategies 19

Farmer/Worker • One way to do data partitioning • Farmer prepares tasks for workers • Workers receive task and do the work • Work is sent back to farmer • Farmer consolidates results P 0 P 3 P 1 P 4 P 5 P 2 Farmer 9/15/2020 Divide and Conquer Strategies 20

Linda • Linda is a memory model – A model represents a particular way of thinking about problems • Every process has access to a shared pool of memory referred to as tuple space – Data tuples – Process tuples • Processes coordinate by generating, reading, and consuming tuples 9/15/2020 Divide and Conquer Strategies 21

David Gelernter • Linda was developed by David Gelernter, a CS professor at Yale When it came time to name the language, Mr Gelernter said he noted that Ada was named after Ada Augusta Lovelace, the daughter of Lord Byron, the English poet. Miss Lovelace is regarded as the first computer programmer because she worked for the computer pioneer Charles Babbage. Another woman named Lovelace was in the news when Mr Gelernter was casting about for a name -- Linda Lovelace, a star of pornographic films. So he named the language Linda, and it stuck. Asked about it now, Mr Gelernter grins and shrugs, "I was a graduate student at the time, " he said. 9/15/2020 Divide and Conquer Strategies 22

David Gelernter • David Hillel Gelernter is a professor of computer science at Yale University. In the 1980 s, he made seminal contributions to the field of parallel computation, specifically the tuple space model of coordination and the Linda Programming System. He received his Bachelor of Arts degree from Yale University in 1976, and his Ph. D. from the State University of New York, Stony Brook in 1982. In 1993, he was critically injured opening a mailbomb sent by Theodore Kaczynski, who at that time was an unidentified but violent opponent of technological progress, dubbed by the press as "The Unabomber". He recovered from his injuries, while sustaining permanent damage to his right hand eye; chronicling the ordeal in his 1997 book Drawing Life: Surviving the Unabomber. He was nominated to and subsequently became a member of The National Council on the Arts. His biographical summary can be found at the National Endowment for the Arts web site (http: //www. nea. gov/about/NCA/Gelernter. html) 9/15/2020 Divide and Conquer Strategies 23

Linda Goals • • High level language for explicit parallel programming Portability No temporal or spatial relationships between parallel processes Dynamic distribution of tasks at runtime supporting – Dynamic process creation – Static allocation 9/15/2020 Divide and Conquer Strategies 24

Linda – A Memory Model • Tuple Space – Logically shared associative memory – Collection of logically ordered sets of data (tuples) – Accomplish work by generating, using, consuming data tuples • Process tuples – Under active evaluation – When done, become data tuple • Data Tuples – Passive 9/15/2020 Divide and Conquer Strategies 25

Tuple Space Sender Tuple Space Receiver 9/15/2020 Divide and Conquer Strategies 26

Linda – A Programming Model • Linda: – Smart optimizing pre-compiler – Run-time kernel • • Shown to work well with shared memory Suggested will work on distributed memory – – Brenda (Trollius/Cornell) University of MN (Transputer) Cogent Research (OS Model) Laden (RIT) 9/15/2020 Divide and Conquer Strategies 27

Characteristics of the Linda Model • • • Processes are decoupled Process create, look at, destroy data objects Will wait, if try to read non-existent object (dead lock possible!) Objects stored in a shared space accessible to all processes Objects identified by content rather than location 9/15/2020 Divide and Conquer Strategies 28

Linda Programming Paradigm • Distributed data structures accessible to many processes simultaneously • Processes accessing data structures simultaneously • Any data structure in tuple space is accessible to any process in that same tuple space • Linda processes aspire to know as little about each other as possible 9/15/2020 Divide and Conquer Strategies 29

Linda Operations • in/inp – input from tuple space (wait/no wait) (tuple removed in tuple space) • rd/rdp – read from tuple space (wait/no wait) (tuple remains in tuple space) • out – evaluate and then output to tuple space • eval – output to tuple space and then evaluate as series of processes 9/15/2020 Divide and Conquer Strategies 30

Linda Operations: out • out( t ) – new tuple t to be evaluated and then put into tuple space • t – sequence of typed values • Examples: ("a string", 12. 96, 16, y) ( 0, 1 ) 9/15/2020 Divide and Conquer Strategies 31

Linda Operations: in • • in( s ) – causes some tuple t to be withdrawn from tuple space t – chosen arbitrarily from those that match s. s – anti-tuple – sequence of typed fields that may be actual values or formal place holders. t matches s if – Same number of fields – Types of fields match pairwise – Actual values in s matches values of corresponding field in t 9/15/2020 Divide and Conquer Strategies 32

Linda Operations: in • If s matches t then – Actual value in t assigned to formal place holder in s – Evoking process continues then continues to execute – If no match, evoking process waits until there is one • Field types – – – [unsigned] int, long, short, char Float, double Struct Union [] of arbitrary dimensions of above 9/15/2020 Divide and Conquer Strategies 33

Tuple Matching • in("a string", ? f, ? i, y ) – execution searches for passive data tuple having – – First element that is "a string" Second element that has the same type as variable f Third element has same type as variable i Fourth element has same value as variable y • Result: Get values for f and i 9/15/2020 Divide and Conquer Strategies 34

Linda Operations: inp • inp( s ) - same as in, except – – No wait Returns 1, if succeeds Returns 0, if fails May be inefficient depending on implementation 9/15/2020 Divide and Conquer Strategies 35

Linda Operations: rd/rdp • rd(s)/rdp( s ) - same as in/inp, except – tuple is read only, not removed from tuple space 9/15/2020 Divide and Conquer Strategies 36

Linda Operations: eval • eval( t ) - Similar to out except – Tuple is evaluated AFTER being placed in tuple space – New process is created to evaluate each field of t – When all fields completely evaluated, t becomes passive data tuple 9/15/2020 Divide and Conquer Strategies 37

Linda Operations: eval • • Example: eval("e", 7, exp( 7 ) ) Creates 3 element live tuple and returns immediately – Generates 3 processes: • Fist computes “e” • Second computes 7 • Third computes exp(7) – When all done, live tuple replaced by data tuple containing: ("e", 7, 1096. 63… ) • Can be read with: rd("e", 7, ? value ) 9/15/2020 Divide and Conquer Strategies 38

Linda Operations: eval • Comparison with out for (i=0, i<100; i++) out("square roots", i, sqrt( i ) ); for (i=0, i<100; i++) eval("square roots", i, sqrt( i ) ); • Values are inherited only for explicitly referenced names, e. g. , eval("Q", f( x, y ) ); Any static local or global variables in f are NOT initialized! 9/15/2020 Divide and Conquer Strategies 39

To Build A Linda Program • Drop 1 process into tuple space • It creates other process tuples • Process tuples execute in parallel, exchange data by – – Generating data tuples Reading data tuples Consuming data tuples When finished, become data tuple 9/15/2020 Divide and Conquer Strategies 40

Programming Example – Parallel Hello World – chello. cl #include <stdio. h> #include <unistd. h> #define NPROC 8 int real_main() { int i, hello(); out("count", 0); for (i = 0; i < NPROC; ++i) eval("hello_world", hello(i)); in("count", NPROC); for (i = 0; i < NPROC; ++i) in("hello_world", ? int); printf("All processors donen"); return 0; } 9/15/2020 int hello(int id) { int j; char h[100]; if (gethostname(h, sizeof(h)) != 0) { fprintf(stderr, "Problem in gethostname()n"); lexit(1); } printf("Hello World from node %s, virtual proc no: %dn", h, id); in("count", ? j); out("count", j+1); return 0; } Divide and Conquer Strategies 41

Matrix Multiplication • Master –Initializes/cleans-up (real_main) – – • Dumps rows of A and columns of B into tuple space Specifies first element to be computed Handles assembly of data Handles termination Workers – – – Find out what to compute Specifies what to be computed next Gets appropriate row and column data Computes element Outputs computed element 9/15/2020 Divide and Conquer Strategies 42

Matrix Multiplication • Questions: – – How many workers? What tuples do we need? How should we indicate termination? Are there any performance issues? 9/15/2020 Divide and Conquer Strategies 43

$int real_main( argc, argv ) int argc; char **argv; { int dim, /* Actual$

int real_main( argc, argv ) int argc; char **argv; { int dim, /* Actual dimension of matrix */ workers; /* the number of workers */ real_main if ( argc != 3 ) { printf( "Usage: %s <workers> <dim>n", *argv ); lexit( 1 ); } workers = atol( *++argv ); dim = atol( *++argv ); printf( "matrix -- workers: %d, dim: %dn", workers, dim ); master( workers, dim ); return 0; } 9/15/2020 Divide and Conquer Strategies 44

master. 1 void master( workers, dim ) int dim, workers; { int A[MAXARRAYSIZE], B[MAXARRAYSIZE], col_index, result[MAXARRAYSIZE], retriever, /* variable to temporarily hold the value read from tuplespace */ row_index, *row, /* pointer to a single row in A */ *col; /* pointer to a single col in B */ /* * Initialize the two matrices - A by row , B by col * and print them */. . . 9/15/2020 Divide and Conquer Strategies 45

/* Start the C-linda timer utility */ master. 2 start_timer(); /* Put the matrices in the tuple space */ for ( index = 0; index < dim; ++index ) { row = &A[ index ][ 0 ]; col = &B[ index ][ 0 ]; out( "A-row", index, row: dim ); out( "B-col", index, col: dim ); } /* Make a timer split */ timer_split( "done setting up" ); 9/15/2020 Divide and Conquer Strategies 46

master. 3 /* Start workers */ for ( index = 0; index < workers; ++index ) { eval( "worker", worker( index, dim ) ); } /* Indicate element to work on */ out( "NEXT", 0 ); /* Retrieve each element of the product matrix*/ for ( index = 0; index < dim*dim; ++index ) { in( "Result", ? row_index, ? col_index, ? retriever ); result[ row_index ][ col_index ] = retriever; } 9/15/2020 Divide and Conquer Strategies 47

master. 4 /* * Write out results */. . . /* Complete and print timing */ timer_split( "all done" ); print_times( ); } 9/15/2020 Divide and Conquer Strategies 48

$int worker( i, dim ) int i, dim; { int col[ MAXARRAYSIZE ], col_index,$

int worker( i, dim ) int i, dim; { int col[ MAXARRAYSIZE ], col_index, next_index, row[ MAXARRAYSIZE ], row_index, result, *cp, *rp; worker. 1 /* element in the column matrix */ /* element in the row matrix */ while( TRUE ) { 9/15/2020 Divide and Conquer Strategies 49

/* Get index of row of product matrix to compute */ in( "NEXT", ? index ); /* If no more work, indicate termination and stop */ if ( index < 0 ) { out( "NEXT", -1 ); return( 0 ); } else if ( index < dim * dim ) { /* Indicate the next node in the list */ next_index = index + 1; out( "NEXT", next_index ); } else { /* Put out a termination tuple */ out( "NEXT", -1 ); return( 0 ); } 9/15/2020 Divide and Conquer Strategies worker. 2 Managing work to do tuple 50

/* Which row and column indices are we doing? */ row_index = index / dim; col_index = index % dim; /* Read row and column we are interested in */ rd( "A-row", row_index, ? row: dim ); rd( "B-col", col_index, ? col: dim ); worker. 3 /* Compute the appropriate element */ /* Initialize the variables for the dot product */ result = 0; rp = row; cp = col; /* Compute the dot product */ for ( index = 0; index < dim; ++index, ++rp, ++cp ) result += *rp * *cp; 9/15/2020 Divide and Conquer Strategies 51

worker. 4 /* Store the result element in the tuple space. */ out( "Result", row_index, col_index, result ); }/* End while (true) */ } /* End worker */ 9/15/2020 Divide and Conquer Strategies 52

Implementation • The definition of Linda is pretty simple, the tricky part is the implementation • Some Issues – How to find tuples? – Where to keep tuples? – Naming • Interesting project 9/15/2020 Divide and Conquer Strategies 53

Wator Simulation • "Wator" is a simple predator-prey simulation. – A. K. Dewdney, "Computer Recreations“, December, 1984, Scientific American. • There are sharks, fish, and water. – Sharks move, eat fish, and reproduce; they might starve to death. – Fish move and reproduce; they never starve, but might get eaten. – Neither fish nor sharks die of old age. • http: //www. cheesygames. com/wator. php 9/15/2020 Divide and Conquer Strategies 54

Wator Parameters • This simulation requires the following parameters: – – – Size of the ocean. Initial number of fish. Fish gestation period. Initial number of sharks. Shark gestation period. Shark starvation period. 9/15/2020 Divide and Conquer Strategies 55

Wator World • The ocean is an Nx. N array (the size N is an input parameter). It "wraps" to form a torus: – a cell on the right edge is adjacent to cells on the left edge, and a cell on the bottom edge is adjacent to cells on the top edge. • • A location in the array can be empty, or it can hold one fish or one shark (but not both). At the beginning of the simulation, the fish and sharks are placed in random locations in the ocean. 9/15/2020 Divide and Conquer Strategies 56

What a Fish Does • At each step, each fish will – Pick a random direction (one of four directions), and try to move in that direction. The fish can move only if the new square is empty. – If it is time for the fish to reproduce, and the fish was able to move, create a new fish in the just vacated square. Both the old and the new fish begin a new gestation period. – If it is time for the fish to reproduce, but the fish could not move, the fish does not move but remains ready to reproduce at the earliest opportunity. 9/15/2020 Divide and Conquer Strategies 57

What a Shark Does • At each step, each shark will – Check whether it is adjacent to a fish, and if so, move in that direction (and eat the fish). – Otherwise, it picks a random direction, and tries to move in that direction. – Reproduce according to the same rules as a fish (if it is time and the shark can move). – If the shark has not eaten for the time specified, it starves to death (and disappears). 9/15/2020 Divide and Conquer Strategies 58

Sequential Version • Ocean represented as an array – Create structure to hold fish information • Set of loops that runs over the array 9/15/2020 Divide and Conquer Strategies 59

Parallel Version • Ideas? 9/15/2020 Divide and Conquer Strategies 60

Parallel Version • Distribute ocean array across processors • Issues? 9/15/2020 Divide and Conquer Strategies 61

Boundaries • One issue is how does a processor know if a location on an adjacent processor is empty? – At beginning of update each processor could send its boundaries to its neighbors • Does this solve all problems? 9/15/2020 Divide and Conquer Strategies 62

Collisions • The distributed approach can lead to collisions – Two processors try to move a fish/shark into the same spot – Happens because boundaries are only exchanged at the beginning of the update 9/15/2020 Divide and Conquer Strategies 63

Handling Collisions • Rollback – Send the fish back from where it came – What happens when it is returned? • Same spot? • Different spot? • Kill one of the fish – Easy but clearly not correct – Might be the easiest thing to do 9/15/2020 Divide and Conquer Strategies 64

Any Other Issues? • Any other issues with the parallel version? 9/15/2020 Divide and Conquer Strategies 65

Load Balancing • Most of the processors will be iterating over empty ocean • Perhaps instead of distributing the ocean, we should distribute the fish • There may be a need to rebalance the fish as the simulation proceeds 9/15/2020 Divide and Conquer Strategies 66