Comments on CoArray Fortran Robert W Numrich Minnesota
Comments on Co-Array Fortran Robert W. Numrich Minnesota Supercomputing Institute University of Minnesota, Minneapolis University of Minnesota
Philosophy Behind the CAF Model • • A minimum number of new features that look and feel like Fortran. The rules for co-dimensions are the same as those for normal dimensions with a few exceptions. The CAF model is purely local. The compiler performs normal optimization between synchronization points. Compiler is not required to perform, is not expected to perform, is deliberately prevented from performing, global optimization. Programmer is responsible for explicit data distribution, explicit communication, and explicit synchronization. Programmer is responsible for memory consistency. 2
The Essential Features of CAF Co-array syntax 1. 2. Co-dimensions (How images are related to each other) Data communication (How data moves between images) Synchronization 1. 2. Full barrier Pair-wise handshake Dynamic Memory management 1. 2. Allocatable co-arrays Allocatable/pointer components of co-array derived types 3
Recommendations Delete all of Section 8. 5 1. Substitute SYNC[([my. Pal])] Delete everything on I/O except 1. 2. Only image 1 reads stdin Records from different images not mixed to stdout Delete all the stuff on collectives 1. 2. Substitute the functions glb. Sum(x), glb. Min(x), glb. Max(x) Argument is not necessarily a co-array. 4
Ragged Arrays 1. The most important, the most powerful, feature of the CAF model. 1. 2. 3. 2. Allows each image to have data structures with different sizes and different locations on each image. No other model can handle this. Fortran with CAF extensions handles it in a very natural way. Allocatable/pointer components of co-array derived types type x real, pointer : : ptr(: ) end type x 3. real, allocatable, target : : z(: ) type(x) : : y[ ] allocate(z(some. Rule(this_image))) y%ptr => z sync y[p]% ptr = 25. 0 Most difficult feature to implement on crummy hardware and/or crummy operating systems. 1. 2. Good systems should be rewarded for getting this right. We should not allow bad systems to drag everything down to the lowest common denominator. 5
Memory Consistency 1. The programmer is responsible for maintaining memory consistency, not the compiler. The rules must be very simple and very clear. With just SYNC, it is very simple The addition of NOTIFY/QUERY makes it not so simple Segment boundary statements 2. 3. 4. 5. - Compiler may optimize between segment boundaries but not across segment boundaries Processor must make co-arrays “visible” to all images across segment boundaries The programmer must make sure that one and only one image defines a co-array variable “at the same time” The programmer must make sure that no image tries to reference a co-array variable “at the same time” as another image is trying to define it. 6
Input/Output 1. What’s the minimum needed? – stdin/stdout are special cases – – Always connected to all images Only image 1 can read stdin System must not mix records to stdout from different images Shared files – – Allow each image to open to same unit Direct access only – – – Allowing sequential access requires changes to backspace, rewind, etc. open(unit=u, access= , …) open(unit=u, access=connect. List, …) – – Do we need teams? Do we need to sync? 7
Collectives • Avoid language bloat - • CAF is intended as a low-level language - • Not everything in the MPI Library needs to be reproduced in CAF as intrinsic procedures What do we really need and want? What should the interface be? Collectives are easy to write in CAF for any specific procedure Throwing a long list of new intrinsic procedures over the wall may discourage vendors from adopting CAF If anything, supply intrinsic functions: glb. Sum(x), glb. Min(x), glb. Max(x) argument x need not be a co-array • Propose a supporting library for CAF. 8
VOLATILE Co-arrays? • What do we want the VOLATILE attribute to mean for co-arrays? - • Inhibit optimization? Always read from memory? Always flush to memory? Can VOLATILE make spin-loops work without the need for an artificial sync-memory() from the programmer? - Statements with co-arrays are segment boundary statements 9
SYNC/BARRIER • One new statement SYNC - • Implies full BARRIER for all images No arguments No SYNC_IMAGES - Synchronization between subsets of images can be done with NOTIFY/QUERY 10
NOTIFY/QUERY • Why do we want NOTIFY/QUERY? - • Split-phase sync Subset sync Master-slave work distribution Should they match in pairs? Should we expose which notify matches which query? Maybe we really want EVENTS? - EVENT_POST(tag) EVENT_WAIT(tag) EVENT_CLEAR(tag) 11
- Slides: 11