Message Passing By Prabhat Ranjan Asst Professor Central

Message Passing By Prabhat Ranjan Asst. Professor, Central University of South Bihar

General organization of a communication system q. Interprocess communication is at the heart of distributed computing. q. User processes run on host machines that are connected to one another through a network, and the network carries data in the form of signals that propagate from one process to another.

Interprocess communication: Network Stack - OSI Reference Model

Layered Protocol

OSI Reference Model

Interprocess communication Ø What we mean ? when two computers of a distributed systems communicate with each other. q q we mean that two processes, one running on each computer, are in communication with each other. A process is the execution of a program. Ø In a distributed system, processes executing on different computers often need to communicate with each other to achieve some common goal. Ø A distributed system/distributed operating system needs to provide interprocess communication (l. PC) mechanisms to facilitate such communication activities among processes. Ø IPC basically requires information sharing among two or more processes.

Information sharing methods: Ø Shared-data approach In this, the information to approach be shared is placed in a common ØØ Message-passing memory area that is accessible to all the processes involved in an IPC. Message-passing approach Ø Ø In the message-passing approach, the information to be shared is physically copied from the sender process's address space to the address spaces of all the receiver processes, . This is done by transmitting the data to be copied in the form of messages (a message is a block of information).

Information sharing methods: Contd. NOTE Ø Computers in a network do not share memory Ø Processes in a distributed system normally communicate by exchanging messages rather than through shared data. Ø Message passing approach is the basic IPC mechanism in distributed systems.

Message-passing system Ø A message-passing system is a subsystem of a distributed system or operating system that provides a set of message-based IPC protocols. Ø A message-passing system shield the details of complex network protocols and multiple heterogeneous platforms from programmers. Ø It enables processes to communicate by exchanging messages and allows programs to be written by using simple communication primitives, such as send() and receive(). Ø It serves as a suitable infrastructure for building other higher level l. PC systems, such as remote procedure call and distributed shared memory

Desirable features of a good message-passing system Simplicity Ø A message-passing system should be simple and easy to use. Ø It must be straightforward to construct new applications and to communicate with existing ones q by using the primitives (send() and receive()) provided by the message-passing system. Ø It should also be possible for a programmer to designate the different modules of a distributed application and to send and receive messages between them. q In a way as simple as possible without the need to worry about the system and/or network aspects that are not relevant for the application level.

Desirable features of a good message-passing system Uniform Semantics Ø In a distributed system, a message-passing system may be used for the following two types of interprocess communication: q q Local communication: In which the communicating processes are on the same node Remote communication: communicating processes are on different nodes Ø Semantics of remote communications should be as close as possible to those of local communications.

Desirable features of a good message-passing system Efficiency Ø If the message-passing system is not efficient, interprocess communication may become so expensive. q Hence application designers will strenuously try to avoid its use in their applications. Ø An IPC protocol of a message-passing system can be made efficient by reducing the number of message exchanges. Ø Some optimizations normally adopted for efficiency include the following: Ø Avoiding the costs of establishing and terminating connections between the same pair of processes for each and every message exchange between them. Ø Minimizing the costs of maintaining the connections. Ø Piggybacking of acknowledgment of previous messages with the next message.

Desirable features of a good message-passing system Reliability Ø A reliable IPC protocol can cope with failure problems and guarantees the delivery of a message. Ø Handling of lost messages usually involves acknowledgments and retransmissions on the basis of timeouts. Ø A reliable IPC protocol is also capable of detecting and handling duplicates. Ø Duplicate handling usually involves generating and assigning appropriate sequence numbers to messages.

Desirable features of a good message-passing system Correctness Ø Correctness is a feature related to IPC protocols for group communication. Ø Issues related to correctness are as follows: q q q Atomicity: ensures that every message sent to a group of receivers will be delivered. Ordered delivery: ensures that messages arrive at all receivers in an order acceptable to the application. Survivability: guarantees that messages will be delivered correctly despite partial failures of processes, machines, or communication links to either all of them or none of them.

Desirable features of a good message-passing system Flexibility Ø Not all applications require the same degree of reliability and correctness of the IPC protocols. Ø Many applications do not require atomicity or ordered delivery of messages. q For example, a client may multicast a request message to a group of servers and offer the job to the first server that replies. q Atomicity of message delivery is not required in this case. Ø The IPC protocols of a message-passing system must be flexible enough to cater to the various needs of different applications. Ø The IPC primitives should be such that the users have the flexibility to choose and specify the types and levels of reliability and correctness requirements of their applications.

Desirable features of a good message-passing system Security Ø A good message-passing system must be capable of providing a secure end-to-end communication. Ø A message in transit on the network should not be accessible to any user other than those to whom it is addressed and the sender. Ø Steps necessary for secure communication are as follows: q q q Authentication of the receiver(s) of a message by the sender. Authentication of the sender of a message by its receiver(s). Encryption of a message before sending it over the network.

Desirable features of a good message-passing system Portability Ø There are two different aspects of portability in a message-passing system: Ø The message-passing system should itself be portable. q It should be possible to easily construct a new IPC facility on another system by reusing the basic design of the existing message-passing system. Ø The applications written by using the primitives of the IPC protocols of the message-passing system should be portable. Ø This requires that heterogeneity(computers of different architectures) must be considered while designing a message-passing system.

A typical message structure Ø A message is a block of information formatted by a sending process in such a manner that it is meaningful to the receiving process. Ø It consists of a fixed-length header and a variable-size collection of typed data objects. Ø Address: It contains characters that uniquely identify the sending and receiving processes in the network. q This element has two parts-one part is the sending process address and the other part is the receiving process address.

A typical message structure Ø Sequence number: This is the message identifier (ID) q which is very useful for identifying lost messages and duplicate messages in case of system failures. Ø Structural information: This element also has two parts: q The type part specifies whether the data to be passed on to the receiver is included within the message or the message only contains a pointer to the data, which is stored somewhere outside the contiguous portion of the message. q The second part of this element specifies the length of the variable-size message data.

Issues need to be considered in the design of an IPC protocol Ø Who is the sender? Ø Who is the receiver? Ø Is there one receiver or many receivers? Ø Is the message guaranteed to have been accepted by its receiver(s)? Ø Does the sender need to wait for a reply? Ø What should be done if a catastrophic event such as a node crash or a communication link failure occurs during the course of communication? Ø What should be done if the receiver is not ready to accept the message: q q Will the message be discarded or stored in a buffer? In the case of buffering, what should be done if the buffer is full?

Synchronization Ø Communication between processes takes place through calls to send() and receive() primitives. Ø A central issue in the communication structure is the synchronization imposed on the communicating processes by the communication primitives. Ø Message passing may be either (semantics used for synchronization) q q blocking (synchronous) Nonblocking (asynchronous)

Synchronization Ø Nonblocking: A primitive (send() and receive()) is said to be nonblocking semantics if its invocation does not block the execution of its invoker (the control returns almost immediately to the invoker)

Synchronization Ø Blocking: A primitive (send() and receive()) is said to be blocking semantics if its invocation block the current execution of its invoker

Synchronization Discussion on each primitive type based on blocking and nonblocking semantics. blocking send () primitive: after execution of the send statement, the sending process is blocked until it receives an acknowledgment from the receiver that the message has been received. nonblocking send() primitive: after execution of the send statement, the sending process is allowed to proceed with its execution as soon as the message has been copied to a buffer. blocking receive primitive: after execution of the receive statement, the receiving process is blocked until it receives a message. nonblocking receive primitive: the receiving process proceeds with its execution after execution of the receive statement, which returns control almost immediately just after telling the kernel where the message buffer is.

Synchronization Issue in a nonblocking receive primitive How the receiving process knows that the message has arrived in the message buffer ? One of the following two methods is commonly used for this purpose: Ø Polling Ø Interrupt

Synchronization Ø Polling q q In this method, a test primitive is provided to allow the receiver to check the buffer status. The receiver uses this primitive to periodically poll the kernel to check if the message is already available in the buffer. Ø Interrupt In this method, when the message has been filled in the buffer and is ready for use by the receiver, a software interrupt is used to notify the receiving process. Note q This method permits the receiving process to continue with its execution without having to issue unsuccessful test requests. q

Synchronization Issue in a blocking send() and receive() primitive Ø Blocking send() primitive, q q q The sending process could get blocked forever in situations where the potential receiving process has crashed or the sent message has been lost on the network due to communication failure. To prevent this situation, blocking send primitives often use a timeout value that specifies an interval of time after which the send operation is terminated with an error status. The timeout value may be a default value or the users may be provided with the flexibility to specify it as a parameter of the send primitive.

Synchronization Issue in a blocking send() and receive() primitive Ø Blocking receive primitive q A timeout value may also be associated with a blocking receive primitive to prevent the receiving process from getting blocked indefinitely. q This can occur in situations where the potential sending process has crashed or the expected message has been lost on the network due to communication failure.

Buffering Ø How do message transmitted from one process to another? q Messages can be transmitted from one process to another by copying the body of the message from the address space of the sending process to the address space of the receiving process Ø Is the receiving process is always ready to receive a message? q In some cases, the receiving process may not be ready to receive a message transmitted to it but it wants the operating system to save that message for later reception. q In these cases, the operating system will rely on the receiver having a buffer in which messages can be stored prior to the receiving process executing specific code to receive the message.

Buffering Ø The synchronous and asynchronous modes of communication correspond respectively to the two extremes of buffering: q q a null buffer, or no buffering buffer with unbounded capacity. Ø Other two commonly used buffering strategies are: q single-message buffer q finite-bound or multiple-message, buffers.

Buffering Null buffer (No buffering) In case of no buffering, there is no place to temporarily store the message. Ø Hence one of the following implementation strategies may be used in case of no buffering. (How sender and receiver communicate ? ) Case 1: q q The message remains in the sender process's address space and the execution of the send is delayed until the receiver executes the receive(). When the receiver executes receive, an acknowledgment is sent to the sender's kernel saying that the sender can now send the message. On receiving the acknowledgment message, the sender is unblocked, causing the send() to be executed once again. This time, The message is successfully transferred from the sender's address space to the receiver's address space (since the receiver is waiting to receive the message).

Buffering Null buffer (No buffering) Case 2: q q q After executing send(), the sender process waits for an acknowledgment from the receiver process. If no acknowledgment is received within the timeout period, it assumes that its message was discarded and tries again hoping that this time the receiver has already executed receive. Note: • • The sender may have to try several times before succeeding. The sender gives up after retrying for a predecided number of times. Ø In the case of no buffering, the logical path of message transfer is directly from the sender's address space to the receiver's address space

Buffering Null buffer (No buffering) Important points to be noted: q q The null buffer strategy is generally not suitable for synchronous communication between two processes in a distributed system. This is because, if the receiver is not ready, a message has to be transferred two or more times, and the receiver of the message has to wait for the entire time taken to transfer the message across the network. In a distributed system, message transfer across the network may require significant time in some cases. Therefore, instead of using the null buffer strategy, synchronous communication mechanisms in network/distributed systems use a singlemessage buffer strategy.

Buffering Single-Message buffer q q q In this strategy, a buffer having a capacity to store a single message is used on the receiver's node. The main idea behind the single-message buffer strategy is to keep the message ready for use at the location of the receiver. In this method, the request message is buffered on the receiver's node if the receiver is not ready to receive the message. The message buffer may either be located in the kernel's address space or in the receiver process's address space. The logical path of message transfer involves two copy operations.

Buffering Unbounded-Capacity Buffer q q q In the asynchronous mode of communication, since a sender does not wait for the receiver to be ready, there may be several pending messages that have not yet been accepted by the receiver. Therefore, an unbounded-capacity message buffer that can store all unreceived messages is needed to support asynchronous communication with the assurance that all the messages sent to the receiver will be delivered. However, Unbounded capacity of a buffer is practically impossible.

Buffering Finite-Bound (or Multiple. -Message) Buffer q q q As Unbounded capacity of a buffer is practically impossible, in practice, systems using asynchronous mode of communication use finite-bound buffers A strategy is needed for handling the problem of a possible buffer overflow (the buffer has finite bound). The buffer overflow problem can be dealt with in one of the following two ways: Ø Ø q Unsuccessful communication. Flow-controlled communication. Unsuccessful communication: Ø Ø Ø Message transfers fail whenever there is no more buffer space. The send normally returns an error message to the sending process, indicating that the message could not be delivered to the receiver because the buffer is full. Unfortunately, the use of this method makes message passing less reliable.

Buffering Finite-Bound (or Multiple. -Message) Buffer q Flow-controlled communication: Ø Ø The second method is to use flow control, which means that the sender is blocked until the receiver accepts some messages, thus creating space in the buffer for new messages. This method introduces a synchronization between the sender and the receiver and may result in unexpected deadlocks. Note q The create buffer system call, when executed by a receiver process, creates a buffer (sometimes called a mailbox or port) of a size specified by the receiver. q The receiver's mailbox may be located either in the kernel's address space or in the receiver process's address space.

Buffering Finite-Bound (or Multiple. -Message) Buffer Note q In the case of asynchronous send with bounded-buffer strategy, the message is first copied Ø from the sending process's memory into the receiving process's mailbox Ø then copied from the mailbox to the receiver's memory when the receiver calls for the message.

MULTIDATAGRAM MESSAGES Ø Almost all networks have an upper bound on the size of data that can Ø Ø Ø be transmitted at a time. This size is known as the maximum transfer unit (MTU) of a network. A message whose size is greater than the MTU has to be fragmented into multiples of the MTU, and then each fragment has to be sent separately. Each fragment is sent in a packet that has some control information in addition to the message data. Each packet is known as a datagram. Messages smaller than the MTU of the network can be sent in a single packet and are known as single-datagram messages. Messages larger than the MTU of the network have to be fragmented and sent in multiple packets, such messages are known as multidatagram messages.

ENCODING AND DECODING OF MESSAGE DATA Ø A message data should be meaningful to the receiving process. q The structure of program objects should be preserved while they are being transmitted from the address space of the sending process to the address space of the receiving process. Ø However, this is not possible in a heterogeneous system in which the sending and receiving processes are on computers of different architectures. Ø Further, even in homogeneous systems, it is very difficult to achieve this goal. Ø The reason behind this are two.

ENCODING AND DECODING OF MESSAGE DATA Ø The reason behind this are as follows: First: q q An absolute pointer value loses its meaning when transferred from one process address space to another. Therefore, such program objects that use absolute pointer values cannot be transferred in their original form, and some other form of representation must be used to transfer them. For example, to transmit a tree object, each element of the tree must be copied in a leaf record and properly aligned in some fixed order in a buffer before it can be sent to another process. The leaf records themselves have no meaning in the address space of the receiving process, but the tree can be regenerated easily from them by using object- typed information o object-type information must be passed between the sender and receiver, indicating not only that a tree object is being passed but also the order in which the leaf records are aligned. (e. g Linked Lists)

ENCODING AND DECODING OF MESSAGE DATA Ø The reason behind this are as follows: Second: q q q Different program objects occupy varying amount of storage space. To be meaningful, a message must normally contain several types of program objects, such as long integers, short integers, variable-length character strings, and so on. In this case, to make the message meaningful to the receiver, there must be some way for the receiver to identify which program object is stored where in the message buffer and how much space each program object occupies.

ENCODING AND DECODING OF MESSAGE DATA Note: Ø Program objects are first converted to a stream form that is suitable for transmission and placed into a message buffer. Ø This conversion process takes place on the sender side and is known as encoding of a message data. Ø The encoded message, when received by the receiver, must be converted back from the stream form to the original program objects before it can be used. Ø The process of reconstruction of program objects from message data on the receiver side is known as decoding of the message data.

ENCODING AND DECODING OF MESSAGE DATA Representations used for the encoding and decoding of a message data: q q tagged representation untagged representation Tagged representation Ø Ø In tagged representation the type of each program object along with its value is encoded in the message. In this method, it is a simple matter for the receiving process to check the type of each program object in the message because of the self-describing nature of the coded data format. untagged representation Ø Ø Ø In untagged representation the message data only contains program objects. No information is included in the message data to specify the type of each program object. The receiving process must have a prior knowledge of how to decode the received data because the coded data format is not self-describing.

PROCESS ADDRESSING Ø Important issue in message-based communication is addressing (or naming) of the parties involved in an interaction. Ø A message-passing system usually supports two types of process addressing q q Explicit addressing Implicit addressing Explicit addressing. Ø The process with which communication is desired is explicitly named as a parameter in the communication primitive used. Ø Primitive used in Explicit addressing are Ø Ø send(process_id, message): Send a message to the process identified by “process_id”. receive(process_id, message): Receive a message from the process identified by “process_id”.

PROCESS ADDRESSING Implicit addressing. Ø A process willing to communicate does not explicitly name a process for communication. Ø Primitive used in Implicit addressing are q q Send_any(service_id, message): Send a message to any process that provides the service of type “service_id”. Ø The sender names a service instead of a process. Ø This type of primitive is useful in client-server communications when the client is not concerned with which particular server out of a set of servers providing the service. receive_any(process_id, message): Receive a message from any process and return the process identifier (“process_id”) of the process from which the message was received. Ø The receiver is willing to accept a message from any sender. Ø This type of primitive is useful in client-server communications when the server is meant to service requests of all clients that are authorized to use its service.

FAILURE HANDLING Ø Distributed system may be prone to partial failures such as a node crash or a communication link failure. Ø During interprocess communication, such failures may lead to the following problems: q Loss of request message: This may happen either due to the failure of communication link between the sender and receiver or because the receiver's node is down at the time the request message reaches there

FAILURE HANDLING Ø During interprocess communication, such failures may lead to the following problems: q Loss of response message. This may happen either due to the failure of communication link between the sender and receiver or because the sender's node is down at the time the response message reaches there.

FAILURE HANDLING Ø During interprocess communication, such failures may lead to the following problems: q Unsuccessful execution of the request: This happens due to the receiver's node crashing while the request is being processed. Ø To cope with these problems, a reliable IPC protocol of a message- passing system is normally designed based on the idea of: q Internal retransmissions of messages after timeouts and the return of an acknowledgment message to the sending machine's kernel by the receiving machine's kernel.

GROUP COMMUNICATION Ø The most elementary form of message-based interaction is one-to-one communication (also known as point-to-point, or unicast, communication) q In this a single-sender process sends a message to a single-receiver process. Ø However, for performance and ease of programming, q several highly parallel distributed applications require that a message passing system should also provide group communication facility. Ø Depending on single or multiple senders and receivers, the following three types of group communication are possible: q One to many (single sender and multiple receivers) q Many to one (multiple senders and single receiver) q Many to many (multiple senders and multiple receivers)

GROUP COMMUNICATION Ø One to many (single sender and multiple receivers) q In this scheme, there are multiple receivers for a message sent by a single sender. q One-to-many scheme is also known as multicast communication. q A special case of multicast communication is broadcast communication, in which the message is sent to all processors connected to a network. Ø Many-to-One Communication q q In this scheme, multiple senders send messages to a single receiver. The single receiver may be selective or nonselective. A selective receiver specifies a unique sender; a message exchange takes place only if that sender sends a message. nonselective receiver specifies a set of senders, and if anyone sender in the set sends a message to this receiver, a message exchange takes place.

GROUP COMMUNICATION Many-to-Many Communication Ø In this scheme, multiple senders send messages to multiple receivers. Ø The one-to-many and many-to-one schemes are implicit in this scheme.