Approximating the Buffer Allocation Problem Using Epochs Jan
Approximating the Buffer Allocation Problem Using Epochs Jan Bækgaard Pedersen (U. of Nevada, Las Vegas) Alex Brodsky, (U. of Winnipeg, Canada)
Parallel Applications Goal: Overlap communication and computation. 10/25/2021 2
Parallel Applications (cont. ) • Today, parallel applications typically – Run on heterogeneous platforms – Are ported to systems with varying resource – Run on hardware platforms that differ from those they were designed on. • Goals: – Ensure that ported applications do not deadlock – Determine application resource requirements • Focus: – Determine an application’s message buffer needs 10/25/2021 3
Message Buffering 10/25/2021 MPI_Send MPI_Recv Unbuffered (synchronous op. ) MPI_Send MPI_Recv Buffered (asynchronous op. ) MPI_Send MPI_Recv Buffer filled (synchronous op. ) 4
The Problem MPI_Send MPI_Recv Succeeds as long as 1 process has a free buffer. If all buffers are exhausted deadlock occurs! MPI_Send Solution: Ensure application has sufficient # of buffers. 10/25/2021 5
Complications MPI_Send Deadlock results by untimely use of buffer. Such problems arise even if the communication • is oblivious (independent of application input) • is point-to-point (no broadcast or multicast) • does not use wild-cards (receiver must specify sender) The problem of allocating buffers for an application is • is NP-hard to find an optimal allocation [BPW’ 05] • is co. NP-complete to verify if an allocation is deadlock-free • needs to be approximated 10/25/2021 6
The Model (Communication Graphs) MPI_Send MPI_Recv com m. arc comp. arc MPI_Send MPI_Recv send vert. recv vert. Key point: An application is represented by a communication graph. 10/25/2021 7
Basic Approach • Treat as a dependency graph • reverse computation arcs • bidirectional communication arcs • A cycle indicates a need for a buffer • Adding a buffer not enough • buffers may be stolen! • Add buffer for every receive • although, this is overkill • ensures delay-freedom If no buffers, receive and send depend on each other. 10/25/2021 8
A Better Approximation • If send (2) always occurs before (3) • buffer will not be stolen • deadlock will be prevented • only one buffer is needed (1) • Implementation issues • how to order sends? • how to enforce order? • what kind of order? • how many buffers? (2) (3) 10/25/2021 9
Our Approach • Partition communication graph into epochs – Epochs are well ordered – Sends within an epoch may not be ordered • Determine per-epoch buffer allocations • Compute maximum buffer allocation over all perepoch allocations – This will be the buffer allocation for the application • Group consecutive epochs into super-epochs • Use barrier primitive to enforce execution order on the super-epochs 10/25/2021 10
Epochs • Are maximal connected subgraphs • Represent phases of an application • No communication between epochs • Execute in serial order 10/25/2021 • All epochs are strictly ordered • Each epoch is executed when • preceding epoch finishes • all buffers are freed • Enforced via barriers 11
Number of Buffers Needed • Compute # of buffers for each epoch • Epochs with cycles: > 0 buffers • Use approximation of [BPW’ 05] • Epochs with no cycles: 0 buffers • No cycles no deadlock • At end of each epoch • all buffers are freed • available for next epoch • Total # of buffers needed is: the maximum over all epochs 1 buffer 0 buffers Total # of buffers needed is max{1, 0} = 1 buffer. 10/25/2021 12
Avoiding Unnecessary Barriers • Barriers are expensive operations • In many cases barriers are not needed • In following example • each send is in its own epoch • a barrier occurs after every send • even though no buffers are needed • Idea: combine epochs into a super-epoch • eliminates unneeded barriers • increases parallelism • does not increase number of buffers • Super-Epochs are treated like epochs • serially ordered • separated by barriers • no explicit synchronization within • use at most max # of buffers over all epochs 10/25/2021 13
Proposed Implementation App. MPI O/S • Compute barrier and buffer allocations • Generate communication graph from the application • Compute epochs and per-epoch buffer allocations • Compute super-epochs and barrier locations • Upload barrier locations and buffer allocations • Before each send, MPI checks if a barrier is needed App. 10/25/2021 14
Technical Contributions • Algorithms for: – partitioning the communication graph into epochs – computing the buffer allocation for each epoch – composing super-epochs from epochs • Descriptions of: – possible mechanism implementation within existing parallel systems – possible mechanism extensions • Analysis: – comparing our approach to existing approaches – complexity of proposed algorithms 10/25/2021 15
- Slides: 15