A Multiplatform Coarray Fortran Compiler for HighPerformance Computing

  • Slides: 1
Download presentation
A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa

A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa • SPMD process images – number of images fixed during execution – images operate asynchronously Simple and expressive models for high performance programming based on extensions to widely used languages • Performance: users control data and computation partitioning • Portability: same language for SMPs, MPPs, and clusters • Programmability: global address space for simplicity • Simple one-sided shared memory communication copy rows from p: p+2 into local • Pointers and dynamic allocation A sensible alternative to these extremes me = this_image() • Synchronization strength-reduction • Communication vectorization • Platform-driven communication optimizations . . . ! ghost cell update a(1: N, N+1)[left(me)] = a(1: N, 0) • Transform as useful from 1 -sided to two-sided and collective comm. • Generate both fine-grain load/store and calls to communication libraries as necessary • Multi-model code for hierarchical architectures • Convert Gets into Puts • Parallel I/O integer: : handle real(8): : ptr(: , : ) A(10, 10)[*] A(10, 10) • The compiler is responsible for data locality and communication • Annotated sequential code (semiautomatic • Using MPI can be difficult and error prone • Most of the burden for communication parallelization) • Requires heroic compiler technology • The model limits the application paradigms: extensions to the standard are required for supporting irregular computation • Source-to-source code generation for wide portability type(Caf. Handle. Real 8) a_caf • Open source compiler will be available . . . image 1 image 2 image N if (me. eq. 1) then A(1: 3, 1: 5)[me+1] = A(1: 3, 1: 5)[me] A(10, 10) allocate( caf. Buffer_1%ptr(1: N, 0: 0) ) • Working prototype for core language features caf. Buffer_2%ptr => a_caf%ptr(1: N, N+1: N+1) caf. Buffer_1%ptr = a_caf%ptr(1: N, 0) • Current compiler implementation performs no optimization – each co-array access is transformed into a get/put operation at the call Caf. Armci. Put. S(a_caf%handle, left(me), caf. Buffer_1, caf. Buffer_2) • Code generation uses the widely-portable ARMCI communication library deallocate( caf. Buffer_1%ptr ) • Front-end based on production-quality Open 64 front end, modified to support source-to-source compilation . . . image 1 Performance Results on IA 64+Myrinet 2000 Implementation Status end type A(10, 10) communication and data locality • Compiler-directed parallel I/O with UIUC • Interoperability with other parallel programming models type Caf. Handle. Real 8 HPF optimization falls on application developers; compiler support is underutilized real(8) a(0: N+1, 0: N+1)[*] . . . • Flexible synchronization • Enhancements to Co-Array Fortran model • Point-to-point one-way synchronization • Hints for matching synchronization events • Collective operations intrinsincs • Split-phase primitives . . . • Both private and shared data – real a(20, 20) private: a 20 x 20 array in each image – real a(20, 20) [*] shared: a 20 x 20 array in each image – x(: , j: j+2) = a(r, : ) [p: p+2] columns PUT Translation Example – sync_team(team [, wait]) • team = a vector of process ids to synchronize with • wait = a vector of processes to wait for (a subset of team) Co-Array Fortran • Portable and widely used • The programmer has explicit control over Research Focus Co-Array Fortran Language Programming Models for High-Performance Computing MPI {johnmc, dotsenko, ccristi}@cs. rice. edu image 2 same point in the code image N Performance Results on Alpha+Quadrics * * For NAS BT and CG the base case is synthetic, so that the first measurable point has efficiency 1. 0 Performance Results on SGI Altix 3000 ** ** Preliminary results on a loaded system in the presence of other users competing for the memory bandwidth