An Approach to Buffer Management in Java HPC
An Approach to Buffer Management in Java HPC Messaging Mark Baker, Bryan Carpenter, and Aamir Shafi Distributed Systems Group http: //dsg. port. ac. uk
Presentation outline • Introduction, • The Buffering Layer in MPJ Express, • Performance Evaluation, • A Use Case from Computational Cosmology, • Conclusions. 10/19/2021 2
Introduction • “Traditional” language based approach to high level parallel programming: –HPF, UPC, Co-Array Fortran etc. • Recently practical parallel programming has focused on commodity languages supplemented by libraries like MPI: –C, C++, Fortran …, –Cost-effective. • Therefore, practical approach to “raise the level” of parallel programming: –Use an advanced commodity language. 10/19/2021 3
Introduction • There are several arguments for using Java for scientific computing, including: – Portability, – Compile-time and runtime safety, – Built-in multi-threading, – Rapid development, – Built-in libraries, – Performance via JIT-compilers. • MPI was finalized in June 1994 as a standard message passing API for technical parallel computing: – ‘Java Grande Message Passing Workgroup’ defined Java bindings in 98. 10/19/2021 4
Introduction • MPJ Express is a high quality implementation of a Java messaging system, based on Java bindings: – Released September 2005, – Thread-safe communication devices for TCP and Myrinet, – Implements derived datatypes, communicators, and virtual topologies, – Runtime system for portable bootstrapping, – Web: http: //dsg. port. ac. uk/projects/mpj 10/19/2021 5
Introduction: Java NIO • Java New I/O: –Non-blocking communication, –Introduces direct and indirect Byte. Buffers: • Direct byte buffer reside in native OS memory, unlike normal Java objects, • The creation of direct byte buffer is costly but provide faster I/O. 10/19/2021 6
Introduction: The Buffering Layer in MPJ Express • MPJ Express requires a buffering layer: –To use Java NIO: • It is possible to read and write data to and from byte buffers onto the wire. –To use proprietary networks like Myrinet efficiently: • Direct byte buffers have memory pointers that can be used for Direct Memory Access (DMA) transfers, • Avoid JNI overheads: – No data-copying between JVM and native OS memory. –It incurs an additional copying overhead though. 10/19/2021 7
Presentation outline • Introduction, • The Buffering Layer in MPJ Express, • Performance Evaluation, • A Use Case from Computational Cosmology, • Conclusions. 10/19/2021 8
Buffering Layer • An extendable buffering layer (mpjbuf): – Supports higher and lower levels of MPJ Express, – Various implementations based on actual storage medium: • Direct or indirect Byte. Buffers, • Arrays, • Native C memory. • An mpjbuf buffer object consists of: – A static buffer to store primitive datatypes, – A dynamic buffer to store serialized Java objects. • Creating Byte. Buffers on the fly is costly: – Memory management is based on Knuth’s buddy algorithm, – Two implementations: • Buddy 1 - store offset and smaller memory footprint, • Buddy 2 - store objects and bigger memory footprint. 10/19/2021 9
Allocation Time Comparison 10/19/2021 10
Faster I/O with direct Byte. Buffers 10/19/2021 11
Presentation outline • Introduction, • The Buffering Layer in MPJ Express, • Performance Evaluation: – MPJ Express compared to MPICH and mpi. Java on Fast Ethernet and Myrinet, • A Use Case from Computational Cosmology, • Conclusions. 10/19/2021 12
Transfer Time on Fast Ethernet 10/19/2021 13
Throughput on Fast Ethernet 10/19/2021 14
Transfer Time on Myrinet 10/19/2021 15
Throughput on Myrinet 10/19/2021 16
Presentation outline • Introduction, • The Buffering Layer in MPJ Express, • Performance Evaluation, • A Use Case from Computational Cosmology, • Conclusions. 10/19/2021 17
Gadget-2 • An experiment to understand Java’s performance. • Gadget-2 is a massively parallel structure formation cosmological simulations that is written in C: – Simulates the evolution of large systems under the influence of gravitational and hydrodynamic forces, – A version was used in “Millenium Simulation” that evolves 10^10 dark matter particles from the early Universe to the current day: • Executed on 512 nodes using a Terabyte of distributed memory, • Used 350, 000 CPU hours over 28 days of elapsed time. – Uses Barnes-hut tree algorithm for calculating gravitational forces, – Domain decomposition based on Peano-Hilbert space filling curve. • We produced a Java version of Gadget-2 using MPJ Express for messaging. • Benchmarking comparison for “Colliding Galaxies”. 10/19/2021 18
Execution Time on Fast Ethernet 10/19/2021 19
Execution Time on Myrinet 10/19/2021 20
Tree walk time comparison 10/19/2021 21
Presentation outline • Introduction, • The Buffering Layer in MPJ Express, • Performance Evaluation, • A Use Case from Computational Cosmology, • Conclusions. 10/19/2021 22
Summary • MPJ Express is becoming a productionquality Java messaging system: – Communication devices for TCP and Myrinet. • MPJ Express relies on an intermediate buffering layer: – Avoid JNI overheads, – Possible to use direct Byte. Buffers, – Memory management using Knuth’s buddy algorithm. • Java version of Gadget-2: – A use-case from computational cosmology. 10/19/2021 23
Conclusions • We have shown that Java has the potential to be a good HPC language. • The additional overhead of copying can be avoided by extending the MPJ API: – Support for sending from and receiving to Byte. Buffers. • Future Work: – Exploit thread-safety of MPJ Express by using Open. MP parallelism, – Next release with Myrinet device in the third quarter of 2006. 10/19/2021 24
Questions ? 10/19/2021 25
- Slides: 25