HPF High Performance Fortran What is HPF HPF

  • Slides: 36
Download presentation
HPF (High Performance Fortran)

HPF (High Performance Fortran)

What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77

What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77 or Fortran-90. • Similar extensions exist for C and C++, but Fortran is really the focus.

Principle of HPF • Extending sequential language with data distribution directives. • Data distribution

Principle of HPF • Extending sequential language with data distribution directives. • Data distribution directives specify on which processor a certain part of an array should reside. • Compiler then produces: – parallel program, – communication between the processes.

What the Standard Says • Can be used with both Fortran-77 and Fortran-90. •

What the Standard Says • Can be used with both Fortran-77 and Fortran-90. • Distribution directives are just a hint, compiler can ignore them. • HPF can be used on both shared memory and distributed memory hardware platforms.

In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are

In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are a must. • HPF used on both shared memory and distributed memory platforms. • But the truth is that the language was really meant for distributed memory platforms.

Not to Confuse You • We will discuss commercial use: – Fortran-90 – Concurrency

Not to Confuse You • We will discuss commercial use: – Fortran-90 – Concurrency extensions to Fortran-90 in HPF. – HPF data distribution directives. – How HPF maps to a distributed memory platform. • Afterwards, we will discuss what the standard allows in addition.

Fortran-90 • • Fortran + a number of array features. Scalar operations are extended

Fortran-90 • • Fortran + a number of array features. Scalar operations are extended to arrays. Intrinsic functions are extended to arrays. Additional array-based intrinsic functions.

Array Assignment Scalar assignment: integer a, b, c a=b+c Array assignment: integer A(10, 10),

Array Assignment Scalar assignment: integer a, b, c a=b+c Array assignment: integer A(10, 10), B(10, 10), C(10, 10) A=B+C

Requirements for Array Assignment • Arrays must be comformable – have the same number

Requirements for Array Assignment • Arrays must be comformable – have the same number of dimensions, and – have the same size in each dimension. • One major exception for scalar is allowed: integer A(10, 10), B(10, 10), c A=B+c

Intrinsic Functions Extended to Arrays integer A(10, 10), B(10, 10) A = SQRT(A) B

Intrinsic Functions Extended to Arrays integer A(10, 10), B(10, 10) A = SQRT(A) B = ABS(A)

Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC – return array of

Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC – return array of indices • SUM, PRODUCT • MATMUL, DOT_PRODUCT, TRANSPOSE

Examples real A(100, 100), B(100), s int i(1), j(2) s = SUM(A) i =

Examples real A(100, 100), B(100), s int i(1), j(2) s = SUM(A) i = MAXLOC(B) j = MINLOC(A) C = DOT_PRODUCT(B, A)

Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section

Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section of the array between lower_bound and upper_bound, with an optional stride specified. • Multiple dimensions may be specified, with the obvious meaning. • Array sections may be used wherever arrays may be used.

Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100,

Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100, 100) A(1: 8) = B(1: 8) + C(2: 9) D = E(1: 100: 2) + F(2: 99: 2) max = MAXVAL( G(1: 100: 10) ) max = MINVAL( H(1: 100, 1: 50) )

Semantics of Array Assignments • First, the entire right hand side is evaluated. •

Semantics of Array Assignments • First, the entire right hand side is evaluated. • Then, assignments are made to the left hand side.

Example int A(4) = {7, 8, 12, 14} A(2: 3) = A(1: 2) =>

Example int A(4) = {7, 8, 12, 14} A(2: 3) = A(1: 2) => results in A being {7, 7, 8, 14} => not {7, 7, 7, 14}

Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics

Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics makes it easy to parallelize it (automatically).

Not Perfect, Though (1 of 2) do i = 1, 100 X(i, i) =

Not Perfect, Though (1 of 2) do i = 1, 100 X(i, i) = 0. 0; enddo • Obviously parallelizable. • Not expressible as a Fortran-90 array assignment (only regular sections).

Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1: 100:

Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1: 100: 2) + F(2: 99: 2) is correct, but int D(100), E(100), F(100) D = E(1: 100: 2) + F(2: 99: 2) is not, because array D is not conformable.

HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.

HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.

FORALL Array Assignment • • FORALL( subscript = lower_bound : upper_bound : stride, mask)

FORALL Array Assignment • • FORALL( subscript = lower_bound : upper_bound : stride, mask) array-assignment Execute all iterations of the subscript loop in parallel for the given set of indices, where mask is true. May have multiple dimensions. Same semantics: first compute right hand side, then assign to left hand side. Only one assignment to particular element (not checked by the compiler!).

Examples (1 of 3) do i = 1, 100 X(i, i) = 0. 0

Examples (1 of 3) do i = 1, 100 X(i, i) = 0. 0 enddo becomes FORALL(i=1: 100) X(i, i) = 0. 0

Examples (2 of 3) int D(100), E(100), F(100) D = E(1: 100: 2) +

Examples (2 of 3) int D(100), E(100), F(100) D = E(1: 100: 2) + F(2: 100: 2) becomes (correctly) FORALL(i=1: 50) D(i) = E(2*i-1) + E(2*i)

Examples (3 of 3) • A multiple dimension example with use of the mask

Examples (3 of 3) • A multiple dimension example with use of the mask option. • Set all the elements of X above the diagonal to the sum of their indices. FORALL(i=1: 100, j=1: 100, i<j) X(i, j) = i+j

The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of

The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of the loop can be executed in any order.

Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100

Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100 IF(i. NE. j) A(i, j) = 1. 0 IF(i. EQ. j) A(i, j) = 0. 0 ENDDO

Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j

Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j = 1, 100 IF(i. NE. j) A(i, j) = 1. 0 IF(i. EQ. j) A(i, j) = 0. 0 ENDDO

HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )

HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )

HPF Matrix Multiply (2 of 4) C = 0. 0 FORALL(i=1: n, j=1: n

HPF Matrix Multiply (2 of 4) C = 0. 0 FORALL(i=1: n, j=1: n ) C(i, j) = C(i, j) + A(i, k) * B(k, j)

HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1, n DO j=1, n

HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1, n DO j=1, n C(i, j) = 0. 0 DO k=1, n C(i, j) = C(i, j) + A(i, k) * B(k, j) ENDDO

HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1, n !HPF$ INDEPENDENT DO

HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1, n !HPF$ INDEPENDENT DO j=1, n C(i, j) = 0. 0 DO k=1, n C(i, j) = C(i, j) + A(i, k) * B(k, j) ENDDO

HPF/Fortran-90 SOR (1 of 4) TEMP(1: n, 1: n) = 0. 25 * (

HPF/Fortran-90 SOR (1 of 4) TEMP(1: n, 1: n) = 0. 25 * ( GRID(1: n, 0: n-1) + GRID(1: n, 2: n+1) + GRID(0: n-1, 1: n) + GRID(2: n+1, 1: n) ) GRID(1: n, 1: n) = TEMP(1: n, 1: n)

HPF/Fortran-90 SOR (1’ of 4) GRID(1: n, 1: n) = 0. 25 * (

HPF/Fortran-90 SOR (1’ of 4) GRID(1: n, 1: n) = 0. 25 * ( GRID(1: n, 0: n-1) + GRID(1: n, 2: n+1) + GRID(0: n-1, 1: n) + GRID(2: n+1, 1: n) ) Also works, because of array assignment rules

HPF SOR (2 of 4) FORALL(i=1: n, j=1: n) TEMP(i, j) = 0. 25

HPF SOR (2 of 4) FORALL(i=1: n, j=1: n) TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) FORALL(i=1: n, j=1, n) GRID(i, j) = TEMP(i, j)

HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1, n DO j=1, n TEMP(i,

HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1, n DO j=1, n TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) !HPF$ INDEPENDENT DO I=1, n DO j=1, n GRID(i, j) = TEMP(i, j)

HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1,

HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1, n TEMP(i, j) = 0. 25 * ( GRID(i-1, j) + GRID(i+1, j) + GRID(i, j-1) + GRID(i, j+1) ) !HPF$ INDEPENDENT DO I=1, n !HPF$ INDEPENDENT DO j=1, n GRID(i, j) = TEMP(i, j)