Next: 7 Structured Communication Up: ADAPTOR HPF Programmers Guide Previous: 5 Communication and Synchronization Contents Index

Subsections

6 Global Communications

This section discusses situations where an HPF compiler like ADAPTOR generates global communications like broadcasts or reductions.

6.1 Broadcast

Every update of a scalar variable is done by all processors to guarantee that all of them have the same value.

In the first example, the update of the scalar variable does not imply any communication.

       S = S + 1      ! done by all processors

The assignment to a scalar variable requires a broadcast of all arguments on the right hand side if they are not local.

      real, dimension (N) :: A
!hpf$ distribute A(block)
      ...
      S = A(I)      ! implies a broadcast of A(I)

I/O statements and calls to a serial routine are only executed by a single processor. Replicated variables that are changed by such a statement must be broadcast afterwards.

      real, dimension (N) :: A       ! replicated A
      real :: S
      ...
      read *, S     ! implies a broadcast of S from the master processor
      read *, A     ! implies a broadcast of A from the master processor

Broadcasts will also be generated if data is replicated along a replicated dimension.

      real, dimension (M,N) :: B
      real, dimension (N)   :: ROW
      real, dimension (M)   :: COL
!hpf$ distribute B (block, block)
!hpf$ align ROW (J) with B(*,J)
!hpf$ align COL (I) with B(I,*)
      ...
      ROW = B(I,:)     ! broadcast of i-th row along processor array
      COL = B(:,J)     ! broadcast of j-th column along processor array

6.2 Spreading

The intrinsic function SPREAD implies a broadcast if the spreading is done with a non-local argument along a distributed dimension of the left hand side.

real, dimension (M,N) :: B
      real, dimension (N)   :: ROW
      real, dimension (M)   :: COL
!hpf$ distribute B (block, block)
!hpf$ align ROW (J) with B(*,J)
!hpf$ align COL (I) with B(I,*)
      ...
      B = spread (spread(G,1,N),1,M)        !  local
      B = spread (ROW, dim=2, ncopies = M)  !  local
      B = spread (COL, dim=1, ncopies = N)  !  local

      B = spread (B(I,:), dim=2, ncopies = M)   ! implicit broadcast
      ! corresponds to ROW = B(I,:); B = spread (ROW, 2, M)

      B = spread (B(:,J), dim=1, ncopies = N)   ! implicit broadcast
      ! corresponds to COL = B(:,J); B = spread (COL, 1, N)

If the array argument of the SPREAD function is not local, ADAPTOR generates a temporary that is aligned to the left hand side but replicated in the dimension used for the spreading.

6.3 Reduction Functions

The following reduction functions are supported by ADAPTOR: ALL, ANY, COUNT, IALL, IANY, IPARITY, SUM, PRODUCT, PARITY, MINVAL, MAXVAL, MINLOC and MAXLOC.

If the reduction functions are used without a ''dim'' argument, the reduction is over the whole array argument. Every processor will make a reduction on its local part, and via a global reduction the processors build the final result.

      real, dimension (N,N)) :: A
      real  S
      integer, dimension (2) :: IJK
!hpf$ distribute (block, block) :: A

      S   = sum (A)
      S   = minval (A, A .gt. 0.0)
      S   = product (A(5:10,2:N-1))
      IJK = minloc (A)

The reduction functions can be used with a ''dim'' argument, but in this case the value must be known at compile time. If the result array is not aligned to the source array but replicated along the reduction dimension, ADAPTOR will create internally a corresponding temporary array.

      real, dimension (M,N) :: B
      real, dimension (N)   :: ROW
      real, dimension (M)   :: COL
!hpf$ distribute B (block, block)
!hpf$ align ROW (J) with B(*,J)
!hpf$ align COL (I) with B(I,*)
      ...
      ROW = sum (B, dim = 1)
      COL = sum (B, dim = 2)

6.4 Reduction Operations in Independent Loops

For the global reduction functions of Fortran 90, it is necessary to collect the results of a parallel loop in a temporary array before. This array must have a size equal to the number of loop iterations.

      integer, parameter :: N = 1000000
      real, dimension (N) :: XA
!hpf$ distribute XA (block)
      real :: X
      ...
      forall (I=1:N) XA(I) = complicated_function(I)
      X = sum (XA)

As this temporary array may be excessively large, the REDUCTION clause has been proposed for HPF 2.0 that is fully supported by ADAPTOR.

      X = 0.0
!hpf$ independent, reduction (X)
      do I = 1, 1000000
         X = X + complicated_function (I)
      end do

ADAPTOR implements this reduction variables as follows: on entry to an independent loop, every processors has its own incarnation of the reduction variable (same type, same shape) associated with each variable in the reduction clause on the INDEPENDENT directive. Its initial value is the same value as it has before the entry of the loop. Each processor performs a subset of the loop iterations; when it encounters a reduction statement, it updates its own copy of the reduction variable. A processor is free to perform its loop iterations in any order.

The final value of the reduction variable is computed by combining the local values with the value of the global reduction variable on entry to the loop. The combining is done in the same way as for the Fortran 90 reduction functions.

Note: Currently, ADAPTOR does not combine the global reductions of different reduction variables. In the following code, two global reductions are generated.

      double precision, dimension (N) :: A, B, C, D
!hpf$ distribute (block) :: A
!hpf$ align (I) with A(I) :: B, C, D
      double precision :: G1, G2
      ...
!hpf$ independent, reduction (G1, G2)
      do I = 1, N
         G1 = G1 + A(I) * D(I)
         G2 = G2 + B(I) * C(I)
       enddo

The variable in the REDUCTION directive can also be an array. By this way, it is possible to combine reductions of the same kind. In the following example, all elements of the reduction array G are globally reduced in one step.

      double precision, dimension (N) :: A, B, C, D
!hpf$ distribute (block) :: A
!hpf$ align (I) with A(I) :: B, C, D
      double precision :: G (2)
      ...
!hpf$ independent, reduction (G)
      do I = 1, N
         G(1) = G(1) + A(I) * D(I)
         G(2) = G(2) + B(I) * C(I)
       enddo

The following example demonstrates the implemenation of a matrix-vector multiplication where the columns of the matrix A are block distributed. The result vector Y is used as a reduction variable. Though the inner loop is also independent, it does not allow for additional parallelism as it works only on data mapped to the same processor.

      real, dimension (N,N) :: A
      real, dimension (N)   :: X, Y
!hpf$ distribute (*,block)  :: A
!hpf$ align X(J) with A(*,J)
!hpf$ align Y(I) with A(I,*)
      ...
!hpf$ independent, reduction(Y)   ! on home (A(:,J))
      do J = 1, N
        do I = 1, N
            Y(I) = Y(I) + A(I,J) * X(J)
        end do
      end do

Next: 7 Structured Communication Up: ADAPTOR HPF Programmers Guide Previous: 5 Communication and Synchronization Contents Index

Thomas Brandes 2004-03-18