More on MPI Nonblocking pointtopoint routines Deadlock Collective

More on MPI • Nonblocking point-to-point routines • Deadlock • Collective communication

Non-blocking send/recv • Most hardware has a communication co-processor: communication can happen at the same time with computation. Proc 0 … MPI_Send Comp … proc 1 MPI_Recv Comp …. No comm/comp overlaps Proc 0 proc 1 … MPI_Send_start Comp … MPI_Send_wait MPI_Recv_start Comp …. MPI_Recv_wait No comm/comp overlaps

Non-blocking send/recv routines • Non-blocking primitives provide the basic mechanisms for overlapping communication with computation. • Non-blocking operations return (immediately) “request handles” that can be tested and waited on. MPI_Isend(start, count, datatype, dest, tag, comm, request) MPI_Irecv(start, count, datatype, dest, tag, comm, request) MPI_Wait(&request, &status)

• One can also test without waiting: MPI_Test(&request, &flag, status) • MPI allows multiple outstanding nonblocking operations. MPI_Waitall(count, array_of_requests, array_of_statuses) MPI_Waitany(count, array_of_requests, &index, &status)

Deadlocks • Send a large message from process 0 to process 1 – If there is insufficient storage at the destination, the send must wait for memory space • What happens with this code? Process 0 Process 1 Send(1) Recv(1) Send(0) Recv(0) • This is called “unsafe” because it depends on the availability of system buffers

Some Solutions to the “unsafe” Problem • Order the operations more carefully: Process 0 Process 1 Send(1) Recv(0) Send(0) Supply receive buffer at same time as send: Process 0 Process 1 Sendrecv(1) Sendrecv(0)

More Solutions to the “unsafe” Problem • Supply own space as buffer for send (buffer mode send) Process 0 Process 1 Bsend(1) Recv(1) Bsend(0) Recv(0) Use non-blocking operations: Process 0 Process 1 Isend(1) Irecv(1) Waitall Isend(0) Irecv(0) Waitall

MPI Collective Communication • Send/recv routines are also called point-to-point routines (two parties). Some operations require more than two parties, e. g broadcast, reduce. Such operations are called collective operations, or collective communication operations. • Non-blocking collective operations in MPI-3 only • Three classes of collective operations: – Synchronization – data movement – collective computation

Synchronization • MPI_Barrier( comm ) • Blocks until all processes in the group of the communicator comm call it.

Collective Data Movement P 0 A P 1 Broadcast P 2 P 3 P 0 ABCD Scatter P 1 P 2 P 3 Gather A A A B C D

Collective Computation P 0 P 1 P 2 P 3 A B C D Reduce Scan ABCD A AB ABCD

MPI Collective Routines • Many Routines: Allgather, Allgatherv, Allreduce, Alltoallv, Bcast, Gatherv, Reduce_scatter, Scan, Scatterv • All versions deliver results to all participating processes. • V versions allow the hunks to have different sizes. • Allreduce, Reduce_scatter, and Scan take both built-in and user-defined combiner functions.

MPI discussion • Ease of use – Programmer takes care of the ‘logical’ distribution of the global data structure – Programmer takes care of synchronizations and explicit communications – None of these are easy. • MPI is hard to use as you will see.

MPI discussion • Exposing architecture features – Force one to consider locality, this often leads to more efficient program. – MPI standard does have some items to expose the architecture feature (e. g. topology). – Performance is a strength in MPI programming. • Would be nice to have both world of Open. MP and MPI.