Chapter 5 Message Passing Architecture Advanced Computer Architecture

  • Slides: 47
Download presentation
Chapter 5 Message Passing Architecture Advanced Computer Architecture and Parallel Processing Hesham El-Rewini &

Chapter 5 Message Passing Architecture Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

Message Passing Architecture • A message passing system typically combines local memory and the

Message Passing Architecture • A message passing system typically combines local memory and the processor at each node of the interconnection network. • There is no global memory so it is necessary to move data from one local memory to another by means of message passing. • This is typically done by send/receive pairs of commands, which must be written into the application software by a programmer. • Each processor has access to its own local memory and can communicate with other processors using the interconnection network.

5. 1 Introduction To Message Passing • A message passing architecture is used to

5. 1 Introduction To Message Passing • A message passing architecture is used to communicate data among a set of processors without the need for a global memory. • Each processor has its own local memory and communicates with other processors using messages. P 1 M 1 P 2 Link 1 M 2 Link 2 Pn Mn Link n Interconnection Network Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 1 Introduction To Message Passing • Nodes communicate with each other by links

5. 1 Introduction To Message Passing • Nodes communicate with each other by links (called external channels)and via an interconnection network, normally a static-type network. • In particular, hypercube and the nearest-neighbor two-dimensional and three-dimensional mesh interconnection networks have received considerable attention over the years.

5. 1 Introduction To Message Passing • In executing a given application program, the

5. 1 Introduction To Message Passing • In executing a given application program, the program is divided into concurrent processes; each is executed on a separate processor. • If the number of processes is larger than the number of processors, then more than one process will have to be executed on a processor in a time-shared fashion. • Processes running on a given processor use what is called internal channels to exchange messages among themselves. • Processes running on different processors use the external channels to exchange messages.

5. 1 Introduction To Message Passing • • Data exchanged among processors cannot be

5. 1 Introduction To Message Passing • • Data exchanged among processors cannot be shared; it is rather copied (using send/receive messages). An important advantage of this form of data exchange is the elimination of the need for synchronization constructs , such as semaphores, which results in performance improvement. In addition, a message passing scheme offers flexibility in accommodating a large number of processors in addition to being readily scalable. It should be noted that a given node can execute more than one process, each at a given time.

5. 1 Introduction To Message Passing - Process Granularity: is the parameter describing the

5. 1 Introduction To Message Passing - Process Granularity: is the parameter describing the size of a process in a message passing system. - Process granularity = computation time/communication time - Types of granularity: • Coarse: each process holds a large number of sequential instructions and takes a substantial amount of time to execute. • Medium: a middle ground where communication overhead is reduced. • Fine: each process contains a few sequential instructions. - Message passing multiprocessors use mostly medium or coarse granularity. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 2 Routing in Message Passing Networks • Routing is defined as the techniques

5. 2 Routing in Message Passing Networks • Routing is defined as the techniques used for a message to select a path over the network channels. • Routing involves the identification of a set of permissible paths that may be used by a message to reach its destination, and a function, that selects one path from the set of permissible paths. • A routing technique is said to be adaptive if, for a given source and destination pair, the path taken by the message depends on network conditions, such as network congestion. • Other one is called deterministic also called oblivious, determines the path using only the source and destination, regardless of the network conditions. •

5. 2 Routing in Message Passing Networks • Routing techniques can also be classified

5. 2 Routing in Message Passing Networks • Routing techniques can also be classified based on the method used to make the routing decision as centralized (self )or distributed routing. – In centralized routing, the routing decisions regarding the entire path are made before sending the message. – In distributed routing, each node decides by itself which channel should be used to forward the incoming message.

5. 2 Routing in Message Passing Networks • Centralized routing requires complete knowledge of

5. 2 Routing in Message Passing Networks • Centralized routing requires complete knowledge of the status of the rest of the nodes in the network. • Distributed routing requires knowledge of only the status of the neighboring nodes. • Examples of the deterministic routing algorithms include the e-cube or dimension order routing used in the mesh and the XOR routing in the hypercube. • The following example illustrates the use of a deterministic routing technique in a hypercube network.

5. 2 Routing in Message Passing Networks • Consider the case where S =10(001010)

5. 2 Routing in Message Passing Networks • Consider the case where S =10(001010) and D =39(100111) in six dimensions hypercube message passing system Then XOR give the R=(101101) the message has to be sent along dimensions 0, 2, 3, and 5 in order to reach the destination. The order in which these dimensions are traversed is not important. Let us assume that the message will follow the route by traversing the following dimensions 5, 3, 2, and 0. Then the route is totally determined as:

5. 2 Routing in Message Passing Networks • Routing for Broadcasting and Multicasting -

5. 2 Routing in Message Passing Networks • Routing for Broadcasting and Multicasting - 2 types of communication operations: • One-to-one (unicast): message is communicated to a single destination node. • Collective: a number of routing operations are under this category. - Broadcast: one-to-all - Multicast: one-to-many Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 2 Routing in Message Passing Networks • A number of possible problems can

5. 2 Routing in Message Passing Networks • A number of possible problems can result from the use of certain routing mechanisms in message passing systems. These include – deadlock, – livelock, – starvation,

5. 2 Routing in Message Passing Networks • Routing Potential Problems - Deadlock: •

5. 2 Routing in Message Passing Networks • Routing Potential Problems - Deadlock: • When 2 messages, each is holding the resources required by the other in order to move, both messages will be blocked (cyclic dependency for resources). • Straightforward solution (but inefficient) is rerouting Rerouting of messages gives rise to nonminimal routing, while discarding messages requires that messages be recovered at the source and retransmitted. This preemptive technique leads to long latency and, therefore, is not used by most message passing networks. . Hesham El-Rewini & Mostafa Abd-El-Barr

5. 2 Routing in Message Passing Networks • A more common technique is to

5. 2 Routing in Message Passing Networks • A more common technique is to avoid the occurrence of deadlock. • This can be achieved by ordering network resources and requiring that messages request use of these resources in a strict monotonic order. • This restricted way for using network resources prevents the occurrence of circular wait, and hence prevents the occurrence of deadlock. • The channel dependency graph (CDG) is a technique used to develop a deadlock-free routing algorithm. • A CDG is a directed graph D = G(C, E), where the vertex set C consists of all the unidirectional channels in the network and the set of edges E includes all the pairs of connected channels, as defined by the routing algorithm.

 • A routing algorithm is deadlock-free if there are no cycles in its

• A routing algorithm is deadlock-free if there are no cycles in its CDG. • Consider, for example, the 4 -node network shown in Figure 5. 4 a. The CDG of the network is shown in Figure 5. 4 b. There are two cycles in the CDG and therefore this network is subject to deadlock. • Figure 5. 4 c shows one possible way to avoid the occurrence of deadlock, that is, disallowing messages to be forwarded from channel c 1 to c 2 and from c 7 to c 8.

5. 2 Routing in Message Passing Networks A 4 -node network and its CDGs

5. 2 Routing in Message Passing Networks A 4 -node network and its CDGs c 1 0 1 c 4 c 6 3 c 5 c 7 c 3 c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 2 2 (a) A 4 -node network c 1 c 5 (b) Channel dependency graph (CDG) c 2 c 6 c 3 c 7 c 4 c 8 (c) CDG for a deadlock-free version of the network Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 2 Routing in Message Passing Networks • Routing Potential Problems - Livelock: •

5. 2 Routing in Message Passing Networks • Routing Potential Problems - Livelock: • A message goes around the network and never reaches its destination. • It results from using adaptive routing algorithms with dynamic injection, where nodes inject their messages in the network at arbitrary times. • Policies to avoid livelock are based on assigning a priority to a message injected to the network: - Messages are routed according to their priorities - Once a message is injected, only a finite number of messages will be injected with higher or equal priority. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 2 Routing in Message Passing Networks • Routing Potential Problems - Starvation: •

5. 2 Routing in Message Passing Networks • Routing Potential Problems - Starvation: • A node suffers from starvation if it has a message to inject into the network but is never allowed to do so. • The simplest policy to avoid starvation is to allow each node to have an injection queue that competes with the queues of the incoming links to the same node. - The main disadvantage is that a node with a high message injection rate can slow down all the other nodes in the network. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • Switching mechanisms refer to the mechanisms

5. 3 Switching Mechanisms in Message Passing • Switching mechanisms refer to the mechanisms used to remove data from an input channel and place it on an output channel. • Network latency is highly dependent on the switching mechanism used. • A number of switching mechanisms have been in use. These are the – store-and-forward, – circuit-switching, – virtual cut-through, – wormhole, – pipelined circuit-switching. In this section, we study some of these techniques.

5. 3 Switching Mechanisms in Message Passing • In circuit-switching networks, the path between

5. 3 Switching Mechanisms in Message Passing • In circuit-switching networks, the path between the source and destination is first determined, all links along that path are reserved, and no buffers are needed in each node. • After data transfer, reserved links are released for use by other messages. • An important characteristic of the circuit-switching technique is that the source and destination are guaranteed a certain bandwidth and maximum latency when communication is established between them.

5. 3 Switching Mechanisms in Message Passing • This static bandwidth allocation regardless of

5. 3 Switching Mechanisms in Message Passing • This static bandwidth allocation regardless of the actual use is the main drawback of the circuit-switching approach. • In addition, circuit switching networks are characterized by having the smallest amount of delay. • This is because message routing overhead is only needed when the circuit is set up; • subsequent messages suffer no, or minimal, additional delay. • Therefore, circuit switching networks can be advantageously used in the case of a large number of message transfers.

5. 3 Switching Mechanisms in Message Passing • The store-and-forward switching mechanism provides an

5. 3 Switching Mechanisms in Message Passing • The store-and-forward switching mechanism provides an alternate data transfer scheme. • The main idea is to offer dynamic bandwidth allocation to messages as they flow through the network, thus avoiding the main drawback of the circuit switching mechanism. • Two main types of store-and-forward networks are common. These are – packet-switched – virtual cut-through networks

5. 3 Switching Mechanisms in Message Passing 1) packet-switched networks, each message is divided

5. 3 Switching Mechanisms in Message Passing 1) packet-switched networks, each message is divided into smaller fixed size parts, called packets, before being transmitted. • Each node must contain enough buffers to hold received packets before transmitting them. • A complete path from source to destination may not be available at the start of transmission. • As links become available, packets are moved from node to node until they reach the destination node. Since packets are routed separately through the network, they may follow different paths to the destination node. • This may lead to packets arriving out of order at the destination. • Therefore, an end-to-end message assembly scheme is needed, incurring additional overhead. .

5. 3 Switching Mechanisms in Message Passing • In virtual cut-through, a packet is

5. 3 Switching Mechanisms in Message Passing • In virtual cut-through, a packet is stored at an intermediate node only if the next required channel is busy. • Virtual cut-through is similar to the packet-switching technique, with the following difference. – In contrast to packet switching, when a packet arrives at an intermediate node and its selected outgoing channel is free, the packet is sent out to the adjacent node towards its destination before it is completely received. • Therefore, the delay due to unnecessary buffering in front of an idle channel is avoided.

 • In these figures, – – L represents the packet length in bits,

• In these figures, – – L represents the packet length in bits, W represents the channel bandwidth in bits/cycle, D is the number of channels, T is the cycle time. • As can be seen from the figures, the latency of the SF and that of the WH are given respectively by

5. 3 Switching Mechanisms in Message Passing • In order to reduce the size

5. 3 Switching Mechanisms in Message Passing • In order to reduce the size of the required buffers and decrease the incurred network latency, a technique called wormhole routing has been introduced. • Here, a packet is divided into smaller units called flits (flow control bits). These flits move in a pipeline fashion with a header flit leading the way to the destination node. • When the header flit is blocked due to network congestion, the remaining flits are also blocked. Only a buffer that can store a flit is required for a successful operation of the wormhole routing technique. • The technique is known to produce a latency that is independent of the path length and it requires less storage at all nodes compared to the store-andforward packet-switching technique.

5. 3 Switching Mechanisms in Message Passing Switching Mechanism Advantages Disadvantages Circuit switching 1.

5. 3 Switching Mechanisms in Message Passing Switching Mechanism Advantages Disadvantages Circuit switching 1. 2. Suitable for long messages Deadlock-free Wasting of bandwidth Store-and-forward 1. 2. 3. Simple Suitable for interactive traffic Bandwidth on demand 1. 2. 3. Buffer for every packet Potential long latency Potential deadlock Virtual cut-through 1. 2. 3. Good for long messages Possible deadlock avoidance Elimination of data-link protocol 1. Need for multiple message buffers Wasting of bandwidth 3. Mainly used with profitable routing Good for long messages Reduced need for buffering Reduced effect of path length 1. 2. Possibility for deadlock Inability to support backtracking Wormhole 1. 2. 3. Advanced Computer Architecture and Parallel Processing 2. Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • Wormhole Routing in Mesh Networks -

5. 3 Switching Mechanisms in Message Passing • Wormhole Routing in Mesh Networks - Dimension-ordered (X-Y) routing • Each packet is routed in one dimension at a time, arriving at the proper coordinate in each dimension before proceeding to the next dimension. • By enforcing a strict monotonic order of dimensions traversed, Deadlock-free routing is guaranteed. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • Wormhole Routing in Mesh Networks -

5. 3 Switching Mechanisms in Message Passing • Wormhole Routing in Mesh Networks - Dimension-ordered (X-Y) routing • Let (sx, sy) and (dx, dy) be coordinates of a source and a destination, (gx, gy) = (dx-sx, dy-sy). • X-Y routing can be implemented by placing gx and gy in the first 2 flits of the message. - When the first flit arrives at a node, it is decremented or incremented: » If the result is different than 0, the message is forwarded along the same direction in which it arrived. » If the result = 0 and the message arrived on the Y-dimension, then the message is delivered to the local node. » If the result = 0 and the message arrived on the X-dimension, the flit is discarded and the next flit is examined. » If the flit is 0, the packet is delivered to local node » Otherwise, the packet is forwarded in the Y-dimension. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • Wormhole Routing in Mesh Networks Destination

5. 3 Switching Mechanisms in Message Passing • Wormhole Routing in Mesh Networks Destination node Source node Dimension ordered (X-Y) routing in an 8× 8 mesh network Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • Virtual Channels - A principle introduced

5. 3 Switching Mechanisms in Message Passing • Virtual Channels - A principle introduced to allow the design of deadlockfree routing algorithms. - Inexpensive method to increase the number of logical channels without adding more wires. - A number of adaptive routing algorithms are based on the use of virtual channels. - Adding virtual channels to an interconnection network is analogous to adding lanes to a street network (blocked messages are allowed to pass). - Virtual channels provide an additional degree of freedom in allocating resources to messages in a network. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • Virtual Channels - The paths X-A-B-Z

5. 3 Switching Mechanisms in Message Passing • Virtual Channels - The paths X-A-B-Z and Y-A-B-W share the common link AB. - Therefore AB is multiplexed between the 2 paths. X Z A B W Y Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 3 Switching Mechanisms in Message Passing • A provision is also needed such

5. 3 Switching Mechanisms in Message Passing • A provision is also needed such that data sent over the first path (lane) is sent from X to Z and not to W and similarly data sent over the second path (lane) is sent from Y to W and not to Z. • This can be achieved if we assume that each physical link is actually divided into a number of unidirectional virtual channels. • Each channel can carry data for one virtual circuit (one path). A circuit (path) from one node to another consists of a sequence of channels on the links along the path between the two nodes.

5. 3 Switching Mechanisms in Message Passing • When data is sent from node

5. 3 Switching Mechanisms in Message Passing • When data is sent from node A to node B, then node B will have to determine the circuit associated with the data such that it can decide whether is should route the data to node Z or to node W.

5. 3 Switching Mechanisms in Message Passing • One way that can be used

5. 3 Switching Mechanisms in Message Passing • One way that can be used to provide such information is to divide the AB link into a fixed number of time slots and statically assign each time slot to a channel. • This way, the time slot on which the data arrives identifies the sending channel and therefore can be used to direct the data to the appropriate destination. • One of the advantages of the virtual channel concept is deadlock avoidance. • This can be done by assigning a few flits per node of buffering. When a packet arrives at a virtual channel, it is put in the buffer and sent along the appropriate time slot.

5. 4 Message Passing Programming Models • A message passing architecture uses a set

5. 4 Message Passing Programming Models • A message passing architecture uses a set of primitives allowing processes to communicate with each other. • These includes the send, receiver, broadcast, and barrier primitives. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 4 Message Passing Programming Models • The send primitive takes a memory buffer

5. 4 Message Passing Programming Models • The send primitive takes a memory buffer and sends it to a destination node. • The receive primitive accepts a message from a source node and stores it in a specified memory buffer. • The basic programming model used in message passing architectures is based on the idea of matching a send request on one processor with a receive request on another. • In such scheme, send and receive are blocking; that is, send blocks until the corresponding receive is executed before data can be transferred.

5. 4 Message Passing Programming Models • Implementation of the send/receive among processes requires

5. 4 Message Passing Programming Models • Implementation of the send/receive among processes requires a three-way protocol as shown in Figure 5. 9. • In this case, the sending process issues a request to-send message to the receiver process. • The latter stores the request and sends a reply message back. When the corresponding receive is executed, the sender process receives the reply and finally transfers the data. • The blocking send/receive is simple; it requires no buffering at the source or the destination. • .

5. 4 Message Passing Programming Models • However, the three-way handshaking used in blocking

5. 4 Message Passing Programming Models • However, the three-way handshaking used in blocking send/receive requires that both the sender and the receiver be blocked for at least a full round-trip time. • During this time the processors are idle, thus leading to an increase in the network communication latency. • In addition, with blocking send/receive, it is impossible to overlap communication with computation and thus the network bandwidth cannot be fully utilized

5. 4 Message Passing Programming Models • Implementation of the send/receive among processes requires

5. 4 Message Passing Programming Models • Implementation of the send/receive among processes requires a 3 -way protocol: - No buffering required at source or destination. P 1 (sender) 2 m (request to send) 1 m 2 (ready to receive) m 3 (data transfer) P(receiver) Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 5 Processor Support For Message Passing • The following features are needed to

5. 5 Processor Support For Message Passing • The following features are needed to support message passing: - Port (communication channel) where 2 operations can be performed: send and receive. - Messages are used as communication among objects (fixed header, variable size of message body). - A task can hold multiple access rights on ports. In port set, a task can have either all or none of the access rights to a group of ports. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 7 Message Passing vs. Shared Memory Architectures • Shared Memory - Communications using

5. 7 Message Passing vs. Shared Memory Architectures • Shared Memory - Communications using implicit loads and stores to a global address space. - Communication and synchronization are distinct. - The programmer isn’t concerned with the details of the interprocessor communication. - This model is a polling interface (drawback as far as synchronization is concerned). - One-way communication of data isn’t possible. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 7 Message Passing vs. Shared Memory Architectures • Message Passing - Explicit communication

5. 7 Message Passing vs. Shared Memory Architectures • Message Passing - Explicit communication model. - Messages include both data and synchronization in a single unit. - This model lends itself to applications having large synchronization components. - This model suffers from the need for marshaling cost. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

5. 8 Summary • Shared memory systems may be easier to program, but are

5. 8 Summary • Shared memory systems may be easier to program, but are difficult to scale up to a large number of processors • If scalability to larger systems was to continue, systems had to use message passing techniques. • It is apparent that message passing systems are the only way to efficiently increase the number of processors managed by a multiprocessor system. • We discussed the architecture and network models of message passing systems. • We shed some light on routing and network switching techniques. Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr