CS 584 Algorithm Analysis Assumptions Consider ring mesh

  • Slides: 19
Download presentation
CS 584

CS 584

Algorithm Analysis Assumptions Consider ring, mesh, and hypercube. n Each process can either send

Algorithm Analysis Assumptions Consider ring, mesh, and hypercube. n Each process can either send or receive a single message at a time. n No special communication hardware. n When discussing a mesh architecture we will consider a square toroidal mesh. n Latency is ts and Bandwidth is tw n

Basic Algorithms Broadcast Algorithms uone to all (scatter) uall to one (gather) uall to

Basic Algorithms Broadcast Algorithms uone to all (scatter) uall to one (gather) uall to all n Reduction uall to one uall to all n

Broadcast (ring) n Distribute a message of size m to all nodes. source

Broadcast (ring) n Distribute a message of size m to all nodes. source

Broadcast (ring) n Distribute a message of size m to all nodes. 3 4

Broadcast (ring) n Distribute a message of size m to all nodes. 3 4 2 source 1 2 n 3 4 Start the message both ways T = (ts + twm)(p/2)

Broadcast (mesh)

Broadcast (mesh)

Broadcast (mesh) Broadcast to source row using ring algorithm

Broadcast (mesh) Broadcast to source row using ring algorithm

Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using

Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using ring algorithm from the source row

Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using

Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using ring algorithm from the source row T = 2(ts + twm)(p 1/2/2)

Broadcast (hypercube)

Broadcast (hypercube)

Broadcast (hypercube) 3 3 2 2 1 3 3 A message is sent along

Broadcast (hypercube) 3 3 2 2 1 3 3 A message is sent along each dimension of the hypercube. Parallelism grows as a binary tree.

Broadcast (hypercube) 3 3 2 2 1 3 T = (ts + twm)log 2

Broadcast (hypercube) 3 3 2 2 1 3 T = (ts + twm)log 2 p 3 A message is sent along each dimension of the hypercube. Parallelism grows as a binary tree.

Broadcast Mesh algorithm was based on embedding rings in the mesh. n Can we

Broadcast Mesh algorithm was based on embedding rings in the mesh. n Can we do better on the mesh? n Can we embed a tree in a mesh? u. Exercise for the reader. (-: hint, hint ; -) n

Other Broadcasts Many algorithms for all-to-one and all-to-all communication are simply reversals and duals

Other Broadcasts Many algorithms for all-to-one and all-to-all communication are simply reversals and duals of the one-to-all broadcast. n Examples u. All-to-one FReverse the algorithm and concatenate u. All-to-all FButterfly and concatenate n

Reduction Algorithms Reduce or combine a set of values on each processor to a

Reduction Algorithms Reduce or combine a set of values on each processor to a single set. u. Summation u. Max/Min n Many reduction algorithms simply use the all-to-one broadcast algorithm. u. Operation is performed at each node. n

Reduction If the goal is to have only one processor with the answer, use

Reduction If the goal is to have only one processor with the answer, use broadcast algorithms. n If all must know, use butterfly. u. Reduces algorithm from 2 log p to log p n

How'd they do that? Broadcast and Reduction algorithms are based on Gray code numbering

How'd they do that? Broadcast and Reduction algorithms are based on Gray code numbering of nodes. n Consider a hypercube. n 110 6 010 2 7 3 011 100 4 000 0 111 5 101 1 001 Neighboring nodes differ by only one bit location.

How'd they do that? Start with most significant bit. n Flip the bit and

How'd they do that? Start with most significant bit. n Flip the bit and send to that processor n Proceed with the next most significant bit n Continue until all bits have been used. n

Procedure Single. Node. Accum(d, my_id, m, X, sum) for j = 0 to m-1

Procedure Single. Node. Accum(d, my_id, m, X, sum) for j = 0 to m-1 sum[j] = X[j]; mask = 0 for i = 0 to d-1 if ((my_id AND mask) == 0) if ((my_id AND 2 i) <> 0 msg_dest = my_id XOR 2 i send(sum, msg_dest) else msg_src = my_id XOR 2 i recv(sum, msg_src) for j = 0 to m-1 sum[j] += X[j] endif mask = mask XOR 2 i endfor end