Performance Evaluation of Adaptive MPI Chao Huang 1
- Slides: 20
Performance Evaluation of Adaptive MPI Chao Huang 1, Gengbin Zheng 1, Sameer Kumar 2, Laxmikant Kale 1 1 University of Illinois at Urbana-Champaign 2 IBM T. J. Watson Research Center 10/6/2020 PPo. PP 06 1
Motivation n Challenges ¡ Applications with dynamic nature n ¡ Traditional MPI implementations n n Shifting workload, adaptive refinement, etc Limited support for such dynamic applications Adaptive MPI ¡ ¡ 10/6/2020 Virtual processes (VPs) via migratable objects Powerful run-time system that offers various novel features and performance benefits PPo. PP 06 2
Outline n n n Motivation Design and Implementation Features and Benefits ¡ ¡ n Adaptive Overlapping Automatic Load Balancing Communication Optimizations Flexibility and Overhead Conclusion 10/6/2020 PPo. PP 06 3
Processor Virtualization n Basic idea of processor virtualization ¡ ¡ ¡ User specifies interaction between objects (VPs) RTS maps VPs onto physical processors Typically, number of VPs >> P, to allow for various optimizations System Implementation User View 10/6/2020 PPo. PP 06 4
AMPI: MPI with Virtualization n Each AMPI virtual process is implemented by a user -level thread embedded in a migratable object MPI processes “processes” 10/6/2020 Real Processors PPo. PP 06 5
Outline n n n Motivation Design and Implementation Features and Benefits ¡ ¡ n Adaptive Overlapping Automatic Load Balancing Communication Optimizations Flexibility and Overhead Conclusion 10/6/2020 PPo. PP 06 6
Adaptive Overlap n Problem: Gap between completion time and CPU overhead n Solution: Overlap between communication and computation Completion time and CPU overhead of 2 -way ping-pong program on Turing (Apple G 5) Cluster 10/6/2020 PPo. PP 06 7
Adaptive Overlap 1 VP/P 2 VP/P 4 VP/P Timeline of 3 D stencil calculation with different VP/P 10/6/2020 PPo. PP 06 8
Automatic Load Balancing n Challenge ¡ ¡ n Dynamically varying applications Load imbalance impacts overall performance Solution ¡ Measurement-based load balancing n n n ¡ Load balancing by migrating threads (VPs) n ¡ 10/6/2020 Scientific applications are typically iteration-based The principle of persistence RTS collects CPU and network usage of VPs Threads can be packed and shipped as needed Different variations of load balancing strategies PPo. PP 06 9
Automatic Load Balancing n Application: Fractography 3 D ¡ 10/6/2020 Models fracture propagation in material PPo. PP 06 10
Automatic Load Balancing CPU utilization of Fractography 3 D without vs. with load balancing 10/6/2020 PPo. PP 06 11
Communication Optimizations n AMPI run-time has capability of ¡ ¡ ¡ n Observing communication patterns Applying communication optimizations accordingly Switching between communication algorithms automatically Examples ¡ ¡ 10/6/2020 Streaming strategy for point-to-point communication Collectives optimizations PPo. PP 06 12
Streaming Strategy n Combining short messages to reduce per-message overhead Streaming strategy for point-to-point communication on NCSA IA-64 Cluster 10/6/2020 PPo. PP 06 13
Optimizing Collectives n n A number of optimization are developed to improve collective communication performance Asynchronous collective interface allows higher CPU utilization for collectives ¡ Computation is only a small proportion of the elapsed time Time breakdown of an all-to-all operation using Mesh library 10/6/2020 PPo. PP 06 14
Virtualization Overhead n Compared with performance benefits, overhead is very small ¡ n Usually offset by caching effect alone Better performance when features are applied Performance for point-to-point communication on NCSA IA-64 Cluster 10/6/2020 PPo. PP 06 15
Flexibility n Running on arbitrary number of processors ¡ Runs with a specific number of MPI processes ¡ Big runs on a few processors 10/6/2020 3 D stencil calculation of size 2403 run on Lemieux. PPo. PP 06 16
Outline n n n Motivation Design and Implementation Features and Benefits ¡ ¡ n Adaptive Overlapping Automatic Load Balancing Communication Optimizations Flexibility and Overhead Conclusion 10/6/2020 PPo. PP 06 17
Conclusion n Adaptive MPI supports the following benefits ¡ ¡ ¡ n AMPI is being used in real-world parallel applications and frameworks ¡ ¡ n Adaptive overlap Automatic load balancing Communication optimizations Flexibility Automatic checkpoint/restart mechanism Shrink/expand Rocket simulation at CSAR FEM Framework Portable to a variety of HPC platforms 10/6/2020 PPo. PP 06 18
Future Work n Performance Improvement ¡ ¡ ¡ n Reducing overhead Intelligent communication strategy substitution Machine-topology specific load balancing Performance Analysis ¡ 10/6/2020 More direct support for AMPI programs PPo. PP 06 19
Thank You! Download of AMPI is available at: http: //charm. cs. uiuc. edu/ Parallel Programming Lab at University of Illinois 10/6/2020 PPo. PP 06 20
- Massimo ferrario infn
- Adaptive insertion policies for high performance caching
- Adaptive insertion policies for high performance caching
- Shih chao-hwei
- Chao-hsien chu
- Quy trình sản xuất chao
- Enem ppl 2016 para reciclar um motor
- Jackson chao
- Batatinha quando nasce espalha a rama pelo chão
- Dysplastic obesity
- Que es codiciar los bienes ajenos
- Chao seader method
- Paralelos chão
- Evaluation for unit 6
- Performance evaluation using variances from standard costs
- Measurement and evaluation in human performance 5e download
- Cpolrhp belvoir army mil eur index htm
- Purchasing performance evaluation
- Progress and performance measurement and evaluation
- Stronge leader effectiveness performance evaluation model
- Library staff performance evaluation