HPF High Performance Fortran What is HPF HPF

What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77

Principle of HPF • Extending sequential language with data distribution directives. • Data distribution

What the Standard Says • Can be used with both Fortran-77 and Fortran-90. •

In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are

Not to Confuse You • We will discuss commercial use: – Fortran-90 – Concurrency

Fortran-90 • • Fortran + a number of array features. Scalar operations are extended

Array Assignment Scalar assignment: integer a, b, c a=b+c Array assignment: integer A(10, 10),

Requirements for Array Assignment • Arrays must be comformable – have the same number

Intrinsic Functions Extended to Arrays integer A(10, 10), B(10, 10) A = SQRT(A) B

Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC – return array of

Examples real A(100, 100), B(100), s int i(1), j(2) s = SUM(A) i =

Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section

Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100,

Semantics of Array Assignments • First, the entire right hand side is evaluated. •

Example int A(4) = {7, 8, 12, 14} A(2: 3) = A(1: 2) =>

Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics

Not Perfect, Though (1 of 2) do i = 1, 100 X(i, i) =

Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1: 100:

HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.

FORALL Array Assignment • • FORALL( subscript = lower_bound : upper_bound : stride, mask)

Examples (1 of 3) do i = 1, 100 X(i, i) = 0. 0

Examples (2 of 3) int D(100), E(100), F(100) D = E(1: 100: 2) +

Examples (3 of 3) • A multiple dimension example with use of the mask

The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of

Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100

Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j

HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )

HPF Matrix Multiply (2 of 4) C = 0. 0 FORALL(i=1: n, j=1: n

HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1, n DO j=1, n

HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1, n !HPF$ INDEPENDENT DO

HPF/Fortran-90 SOR (1 of 4) TEMP(1: n, 1: n) = 0. 25 * (

HPF/Fortran-90 SOR (1’ of 4) GRID(1: n, 1: n) = 0. 25 * (

HPF SOR (2 of 4) FORALL(i=1: n, j=1: n) TEMP(i, j) = 0. 25

HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1, n DO j=1, n TEMP(i,

HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1,

Slides: 36

Download presentation

HPF (High Performance Fortran)

What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77 or Fortran-90. • Similar extensions exist for C and C++, but Fortran is really the focus.

Principle of HPF • Extending sequential language with data distribution directives. • Data distribution directives specify on which processor a certain part of an array should reside. • Compiler then produces: – parallel program, – communication between the processes.

What the Standard Says • Can be used with both Fortran-77 and Fortran-90. • Distribution directives are just a hint, compiler can ignore them. • HPF can be used on both shared memory and distributed memory hardware platforms.

In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are a must. • HPF used on both shared memory and distributed memory platforms. • But the truth is that the language was really meant for distributed memory platforms.

Not to Confuse You • We will discuss commercial use: – Fortran-90 – Concurrency extensions to Fortran-90 in HPF. – HPF data distribution directives. – How HPF maps to a distributed memory platform. • Afterwards, we will discuss what the standard allows in addition.

Fortran-90 • • Fortran + a number of array features. Scalar operations are extended to arrays. Intrinsic functions are extended to arrays. Additional array-based intrinsic functions.

Array Assignment Scalar assignment: integer a, b, c a=b+c Array assignment: integer A(10, 10), B(10, 10), C(10, 10) A=B+C

Requirements for Array Assignment • Arrays must be comformable – have the same number of dimensions, and – have the same size in each dimension. • One major exception for scalar is allowed: integer A(10, 10), B(10, 10), c A=B+c

Intrinsic Functions Extended to Arrays integer A(10, 10), B(10, 10) A = SQRT(A) B = ABS(A)

Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC – return array of indices • SUM, PRODUCT • MATMUL, DOT_PRODUCT, TRANSPOSE

Examples real A(100, 100), B(100), s int i(1), j(2) s = SUM(A) i = MAXLOC(B) j = MINLOC(A) C = DOT_PRODUCT(B, A)

Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section of the array between lower_bound and upper_bound, with an optional stride specified. • Multiple dimensions may be specified, with the obvious meaning. • Array sections may be used wherever arrays may be used.

Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100, 100) A(1: 8) = B(1: 8) + C(2: 9) D = E(1: 100: 2) + F(2: 99: 2) max = MAXVAL( G(1: 100: 10) ) max = MINVAL( H(1: 100, 1: 50) )

Semantics of Array Assignments • First, the entire right hand side is evaluated. • Then, assignments are made to the left hand side.

Example int A(4) = {7, 8, 12, 14} A(2: 3) = A(1: 2) => results in A being {7, 7, 8, 14} => not {7, 7, 7, 14}

Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics makes it easy to parallelize it (automatically).

Not Perfect, Though (1 of 2) do i = 1, 100 X(i, i) = 0. 0; enddo • Obviously parallelizable. • Not expressible as a Fortran-90 array assignment (only regular sections).

Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1: 100: 2) + F(2: 99: 2) is correct, but int D(100), E(100), F(100) D = E(1: 100: 2) + F(2: 99: 2) is not, because array D is not conformable.

HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.

FORALL Array Assignment • • FORALL( subscript = lower_bound : upper_bound : stride, mask) array-assignment Execute all iterations of the subscript loop in parallel for the given set of indices, where mask is true. May have multiple dimensions. Same semantics: first compute right hand side, then assign to left hand side. Only one assignment to particular element (not checked by the compiler!).

Examples (1 of 3) do i = 1, 100 X(i, i) = 0. 0 enddo becomes FORALL(i=1: 100) X(i, i) = 0. 0

Examples (2 of 3) int D(100), E(100), F(100) D = E(1: 100: 2) + F(2: 100: 2) becomes (correctly) FORALL(i=1: 50) D(i) = E(2*i-1) + E(2*i)

Examples (3 of 3) • A multiple dimension example with use of the mask option. • Set all the elements of X above the diagonal to the sum of their indices. FORALL(i=1: 100, j=1: 100, i<j) X(i, j) = i+j

The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of the loop can be executed in any order.

Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100 IF(i. NE. j) A(i, j) = 1. 0 IF(i. EQ. j) A(i, j) = 0. 0 ENDDO

Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j = 1, 100 IF(i. NE. j) A(i, j) = 1. 0 IF(i. EQ. j) A(i, j) = 0. 0 ENDDO

HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )

HPF Matrix Multiply (2 of 4) C = 0. 0 FORALL(i=1: n, j=1: n ) C(i, j) = C(i, j) + A(i, k) * B(k, j)

HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1, n DO j=1, n C(i, j) = 0. 0 DO k=1, n C(i, j) = C(i, j) + A(i, k) * B(k, j) ENDDO

HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1, n !HPF$ INDEPENDENT DO j=1, n C(i, j) = 0. 0 DO k=1, n C(i, j) = C(i, j) + A(i, k) * B(k, j) ENDDO

HPF/Fortran-90 SOR (1 of 4) TEMP(1: n, 1: n) = 0. 25 * ( GRID(1: n, 0: n-1) + GRID(1: n, 2: n+1) + GRID(0: n-1, 1: n) + GRID(2: n+1, 1: n) ) GRID(1: n, 1: n) = TEMP(1: n, 1: n)

HPF/Fortran-90 SOR (1’ of 4) GRID(1: n, 1: n) = 0. 25 * ( GRID(1: n, 0: n-1) + GRID(1: n, 2: n+1) + GRID(0: n-1, 1: n) + GRID(2: n+1, 1: n) ) Also works, because of array assignment rules

HPF SOR (2 of 4) FORALL(i=1: n, j=1: n) TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) FORALL(i=1: n, j=1, n) GRID(i, j) = TEMP(i, j)

HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1, n DO j=1, n TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) !HPF$ INDEPENDENT DO I=1, n DO j=1, n GRID(i, j) = TEMP(i, j)

HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1, n TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1, n GRID(i, j) = TEMP(i, j)