Cha MPIonPro A Multidevice MPI 1 2 Implementation
- Slides: 22
Cha. MPIon/Pro™: A Multidevice MPI 1. 2 Implementation for ASCI Blue and White Using LAPI and Shared Memory Anthony Skjellum Rossen Dimitrov Bronis de Supinski Andrew Watkins October 11, 2001 Scicom. P 4
Background • MPI Software Technology is improving and expanding its commercial middleware – MPI/Pro – to create Cha. MPIon/Pro for Ultrascale systems, including IBM-based ASCI systems • MPI/Pro is deployed across a variety of parallel systems and clusters • Cha. MPIon/Pro is the next-generation commercial, scalable middleware product from MPI Software Technology, to include MPI-2 features
MPI/Pro Features • • Low CPU Overhead (non-polling) Thread safety Asynchronous completion notification Independent message progress (complies fully with the MPI Progress Rule) • Overlapping of computation and communication • Efficient multi-device architecture
MPI/Pro Features • • • Optimized persistent mode of communication Optimized derived data types Efficient mechanisms for message de-multiplexing Handles long-running complex programs reliably Broad conformance and performance testing Open MP friendly
Miscellaneous MPI/Pro Features • • Supports Scyld ‘bproc’ technology Supports efficient Ethernet channel bonding Supports Etnus Total. View™ Supports ROMIO now, native MPI I/O nearly complete
Cha. MPIon/Pro’s Performance and Scalability Objectives • • Scaling to 10, 000 processors or more Multi-device support Topology awareness Thread safety Optimized collective operations Optimized derived datatypes Efficient memory (and NIC resource) usage
Cha. MPIon/Pro’s Functionality and Usability Objectives • • • Integration with schedulers and resource managers Integration with tools (Totalview and Vampir, etc) Functionality controlled by tunable parameters Tunable parameters not intended to fix problems User feedback
Cha. MPIon/Pro 1. 2 Library Architecture Cha. MPIon/Pro API Collectives Point-topoint Portals, LAPI, SMP, Myrinet, Quadrics Groups, Attributes, Virtual Communi- Datatypes Error topologies handlers cators
Cha. MPIon/Pro vs. IBM MPI • Cha. MPIon/Pro layered on top of IBM’s system software (LAPI); IBM has access to lower layers (MPL, kernel mechanisms) • IBM’s MPI at the same or lower level than LAPI • Peak bandwidth of IBM’s MPI higher than LAPI’s • Cha. MPIon/Pro, for Blue, has higher throughput for messages in the range of 4 K to 400 K • Cha. MPIon/Pro has lower latency for short messages (up to 1 K) using the SMP device
Cha. MPIon/Pro Protocols LAPI Device • Short protocol: LAPI_Amsend, LAPI header handler, and LAPI counters; one copy at receiver • Long protocol: LAPI_Amsend, LAPI header handler, LAPI counters, and LAPI_Get; no copies
Cha. MPIon/Pro Protocols SMP Device • Short protocol: System V shared memory, fast locks; one intermediate copy in shared memory - two memcopies • Long protocol: using LAPI Device in loopback mode; the intermediate copy in shared memory limits the bandwidth - LAPI loopback is faster than two memcopies
SMP Device Features • Cha. MPIon/Pro employs a fast locking mechanism, using Power. PC 604 assembler, for lower SMP overhead • Cha. MPIon/Pro has lower latency than IBM’s MPI for short messages in SMP mode on Blue (< 1 K) • Scalable shared memory device—buffer usage does not expand quadratically or even linearly with NP per node
Collective Operations Multi-hierarchy Operations Class 1 Class 2 Class 3 Level 2 Bcast Level 1 Reduce Level 0 Gather Scatter Bcast
Cross-Box Latency (Snow)
Cross-Box Bandwidth (Snow)
On-Box (SMP) Latency (Blue)
Scatter Performance (Long Messages—Blue )
On-Going Work • Completion of MPI I/O capability (in final debugging) • 1 -sided support • Performance-revealing enhancements • Rest of MPI-2
Wish List for Enhancing Cha. MPIon/Pro for IBM Systems • Access to one-copy protocol between unrelated processes in shared memory • Access to KLAPI or other optimal-level transports, as IBM’s MPI is able to exploit • Ability to fine-tune SMP scheduling for collective optimization
Conclusions • Cha. MPIon/Pro is a competitive solution for IBM SP systems, even though it works at a disadvantage today as compared to IBM’s MPI. • Cha. MPIon/Pro is a performance-portable MPI 1. 2 system that will soon provide MPI-2 capabilities • Many added-value improvements will be added in coming years, including further optimizatons • Work over next 18 months will support both largescale and medium-scale SP systems, and other DOE ASCI systems
Questions? Work performed in part under the auspices of the U. S. Department of Energy by University of California Lawrence Livermore National Laboratory under Contract W-7405 Eng-48, UCRL-VG-145222.
Contacts Dr. Anthony Skjellum President 662 -320 -4300 x 15 tony@mpi-softtech. com Dr. Bronis de Supinski LLNL/CASC 925 -422 -1062 bronis@llnl. gov Dr. Rossen Dimitrov Principal Software Engineer 603 -891 -4766 rossen@mpi-softtech. com Andrew Watkins Senior Software Engineer 662 -320 -4300 x 23 andrew@mpi-softtech. com
- Chao cha slide
- Cha cha slide line dance
- Wida mpi template
- Mpi sdk
- Mpi
- Mpi
- Mpi_wait example
- Mpi critical section
- Hotel mpi
- Mpi wax
- Mpi message passing interface tutorial
- Cara menghitung revpar hotel
- Counting sort mpi
- Mpi alternatives
- Wida mpi examples
- Mpi morris
- Mpi send and receive
- Mpi dfw
- Mpi 5001
- #include "mpi.h"
- Mpi wtime
- Fortran mpi hello world
- What is mpi