HPF High Performance Fortran What is HPF HPF




































- Slides: 36
HPF (High Performance Fortran)
What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77 or Fortran-90. • Similar extensions exist for C and C++, but Fortran is really the focus.
Principle of HPF • Extending sequential language with data distribution directives. • Data distribution directives specify on which processor a certain part of an array should reside. • Compiler then produces: – parallel program, – communication between the processes.
What the Standard Says • Can be used with both Fortran-77 and Fortran-90. • Distribution directives are just a hint, compiler can ignore them. • HPF can be used on both shared memory and distributed memory hardware platforms.
In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are a must. • HPF used on both shared memory and distributed memory platforms. • But the truth is that the language was really meant for distributed memory platforms.
Not to Confuse You • We will discuss commercial use: – Fortran-90 – Concurrency extensions to Fortran-90 in HPF. – HPF data distribution directives. – How HPF maps to a distributed memory platform. • Afterwards, we will discuss what the standard allows in addition.
Fortran-90 • • Fortran + a number of array features. Scalar operations are extended to arrays. Intrinsic functions are extended to arrays. Additional array-based intrinsic functions.
Array Assignment Scalar assignment: integer a, b, c a=b+c Array assignment: integer A(10, 10), B(10, 10), C(10, 10) A=B+C
Requirements for Array Assignment • Arrays must be comformable – have the same number of dimensions, and – have the same size in each dimension. • One major exception for scalar is allowed: integer A(10, 10), B(10, 10), c A=B+c
Intrinsic Functions Extended to Arrays integer A(10, 10), B(10, 10) A = SQRT(A) B = ABS(A)
Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC – return array of indices • SUM, PRODUCT • MATMUL, DOT_PRODUCT, TRANSPOSE
Examples real A(100, 100), B(100), s int i(1), j(2) s = SUM(A) i = MAXLOC(B) j = MINLOC(A) C = DOT_PRODUCT(B, A)
Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section of the array between lower_bound and upper_bound, with an optional stride specified. • Multiple dimensions may be specified, with the obvious meaning. • Array sections may be used wherever arrays may be used.
Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100, 100) A(1: 8) = B(1: 8) + C(2: 9) D = E(1: 100: 2) + F(2: 99: 2) max = MAXVAL( G(1: 100: 10) ) max = MINVAL( H(1: 100, 1: 50) )
Semantics of Array Assignments • First, the entire right hand side is evaluated. • Then, assignments are made to the left hand side.
Example int A(4) = {7, 8, 12, 14} A(2: 3) = A(1: 2) => results in A being {7, 7, 8, 14} => not {7, 7, 7, 14}
Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics makes it easy to parallelize it (automatically).
Not Perfect, Though (1 of 2) do i = 1, 100 X(i, i) = 0. 0; enddo • Obviously parallelizable. • Not expressible as a Fortran-90 array assignment (only regular sections).
Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1: 100: 2) + F(2: 99: 2) is correct, but int D(100), E(100), F(100) D = E(1: 100: 2) + F(2: 99: 2) is not, because array D is not conformable.
HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.
FORALL Array Assignment • • FORALL( subscript = lower_bound : upper_bound : stride, mask) array-assignment Execute all iterations of the subscript loop in parallel for the given set of indices, where mask is true. May have multiple dimensions. Same semantics: first compute right hand side, then assign to left hand side. Only one assignment to particular element (not checked by the compiler!).
Examples (1 of 3) do i = 1, 100 X(i, i) = 0. 0 enddo becomes FORALL(i=1: 100) X(i, i) = 0. 0
Examples (2 of 3) int D(100), E(100), F(100) D = E(1: 100: 2) + F(2: 100: 2) becomes (correctly) FORALL(i=1: 50) D(i) = E(2*i-1) + E(2*i)
Examples (3 of 3) • A multiple dimension example with use of the mask option. • Set all the elements of X above the diagonal to the sum of their indices. FORALL(i=1: 100, j=1: 100, i<j) X(i, j) = i+j
The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of the loop can be executed in any order.
Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100 IF(i. NE. j) A(i, j) = 1. 0 IF(i. EQ. j) A(i, j) = 0. 0 ENDDO
Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j = 1, 100 IF(i. NE. j) A(i, j) = 1. 0 IF(i. EQ. j) A(i, j) = 0. 0 ENDDO
HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )
HPF Matrix Multiply (2 of 4) C = 0. 0 FORALL(i=1: n, j=1: n ) C(i, j) = C(i, j) + A(i, k) * B(k, j)
HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1, n DO j=1, n C(i, j) = 0. 0 DO k=1, n C(i, j) = C(i, j) + A(i, k) * B(k, j) ENDDO
HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1, n !HPF$ INDEPENDENT DO j=1, n C(i, j) = 0. 0 DO k=1, n C(i, j) = C(i, j) + A(i, k) * B(k, j) ENDDO
HPF/Fortran-90 SOR (1 of 4) TEMP(1: n, 1: n) = 0. 25 * ( GRID(1: n, 0: n-1) + GRID(1: n, 2: n+1) + GRID(0: n-1, 1: n) + GRID(2: n+1, 1: n) ) GRID(1: n, 1: n) = TEMP(1: n, 1: n)
HPF/Fortran-90 SOR (1’ of 4) GRID(1: n, 1: n) = 0. 25 * ( GRID(1: n, 0: n-1) + GRID(1: n, 2: n+1) + GRID(0: n-1, 1: n) + GRID(2: n+1, 1: n) ) Also works, because of array assignment rules
HPF SOR (2 of 4) FORALL(i=1: n, j=1: n) TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) FORALL(i=1: n, j=1, n) GRID(i, j) = TEMP(i, j)
HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1, n DO j=1, n TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) !HPF$ INDEPENDENT DO I=1, n DO j=1, n GRID(i, j) = TEMP(i, j)
HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1, n TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1, n GRID(i, j) = TEMP(i, j)