Next: 11 Pitfalls Up: ADAPTOR HPF Programmers Guide Previous: 9 Shadow Edges and Contents Index

Subsections

10 Explicit and Implicit Remapping

10.1 About the Importance of Remapping

The following code contains two loop nests. In the first loop nest, every row can be computed independently, In the second loop nest, every column can be computed independently.

      real, dimension (N,N) :: A, B
!hpf$ distribute (block,*) :: A, B 
      do J = 2, N
         forall (I=1:N)
     &     A(I,J) = A(I,J) - A(I,J-1) * B(I,J)
      end do
      do I = 2, N
         forall (J=1:N)
     &     A(I,J) = A(I,J) - A(I-1,J) * B(I,J)
      end do

In the DO J loop, the FORALL statement is local and requires no communication. The DO I loop contains also a FORALL statement, but it will be executed serially as it is related to a serial dimension.

If we transpose the arrays A and B before the DO I loop (see figure 7, then this loop can be executed exactly as the DO J loop.

**Figure 7:** Redistribution of arrays
$\includegraphics[height=45mm]{transpose.eps}$

Generally speaking, the remapping of arrays can reduce the communication in data parallel computations.

10.2 Explicit Remapping via Array Assignments

The most general solution for remapping that should work with nearly every HPF compiler is the introduction of different arrays with different mappings.

      real, dimension (N,N) :: A, B, A1, B1
!hpf$ distribute (block,*) :: A, B
!hpf$ distribute (*,block) :: A1, B1
      ...
      do J = 2, N
         forall (I=1:N)
     &     A(I,J) = A(I,J) - A(I,J-1) * B(I,J)
      end do
      A1 = A; B1 = B      ! redistribution
      do I = 2, N
         forall (J=1:N)
     &     A1(I,J) = A1(I,J) - A1(I-1,J) * B1(I,J)
      end do
      A = A1; B = B1      ! redistribution

10.3 Explicit Remapping via Remapping Directives

HPF allows for explicit remapping of data via special directives.

      real, dimension (N,N) :: A, B
      ...
!hpf$ redistribute (block,*) :: A, B
      do J = 2, N
         forall (I=1:N)
     &     A(I,J) = A(I,J) - A(I,J-1) * B(I,J)
      end do
!hpf$ redistribute (*,block) :: A, B
      do I = 2, N
         forall (J=1:N)
     &     A(I,J) = A(I,J) - A(I-1,J) * B(I,J)
      end do

This solution offers certain advantages:

It avoids the memory overhead for the other incarnations of the arrays with different mappings.
It avoids local data transfer in case of all situations where no data movement is necessary:
- if the code is executed on a single processor both mappings are identical.
- if the arrays A and B are shared among the processors the remapping only changes ownerships but does not imply any reallocation.

10.4 Implicit Remapping at Subroutine Boundaries

10.4.1 Remapping in the Called Routine

With the exception of local routines, serial routines and pure procedures, every subprogram will check the distributions of its dummy arguments and make some redistribution if it is necessary.

Attention: Also descriptive directives will be handled like prescriptive ones. Inherited distributions are not supported.

As the subroutine is responsible for the redistribution, the user can take advantage of the INTENT attribute. It can avoid the copy in or copy out of data in case of a redistribution.

10.4.2 Remapping in the Calling Routines

In the following it will be explained in which situations it is useful, it might be useful and it is necessary to have an interface block.

The most general rule is that any full array or any section of an array is just passed by a descriptor and the called subroutine is responsible for a redistribution. The calling routine has not to do anything and therefore no interface block is necessary.

   call SUB (A(1:N,1:N), B)

The next general rule is that an interface block must be available if the subroutine will not apply redistributions and cannot deal with the actual distribution. The following reasons are possible that a subroutine cannot apply redistributions:

A serial procedure is only called by a single processor An interface block must be available in any case.
Local routines will not redistribute their dummy arguments. An interface block must be available if it is called with any other distribution than the expected one.

The last rule is that an interface should be specified if the calling routine should make the redistribution for optimization issues. This can be the case if

the actual argument requires a temporary array in any case,
or if the routine is called within a loop and the redistribution must be done during every iteration.

Note: About a redistribution within the called routine will be decided at runtime. If the interface block is available in the calling routine, the redistribution will be decided at compile time. In some situations, it can only be determined at runtime that no redistribution is necessary, e.g. on one processor the block and cyclic distribution is the same. Then at least the local copying of data is not necessary.

10.5 Data Transfers with the ON Directive

If a statement or a block should be executed by a processor subset, the compiler must make sure that all data is mapped onto the corresponding active processors. This data transfer can involve other processors that are not part of the active processors.

All dummy arguments must be mapped onto the active processors. By this way, dummy arrays are local within the subprogram that is only executed by a processor subset.
```
      integer, parameter :: N = 100
!hpf$ processors PROCS(4)
      real, dimension (N) :: A
!hpf$ distribute A(block) onto PROCS
      ...
!hpf$ on (PROCS(1:2))
      call TASK (A,N)
```
While the dummy argument N is available on the active processors, the array A must be redistributed. The compiler might create implicitly the following code:
```
!hpf$ redistribute A(block) onto PROCS(1:2)
!hpf$ on (PROCS(1:2))
      call TASK (A,N)
!hpf$ redistribute A(block) onto PROCS
```
All local objects must be mapped onto the active processors.

The compiler will create temporary data and inserting copy-in and copy-out communication for non-local data. Any replicated data changed on a processor subset has to be made consistent afterwards. All processors that were not active must get copies of the new values.

Next: 11 Pitfalls Up: ADAPTOR HPF Programmers Guide Previous: 9 Shadow Edges and Contents Index

Thomas Brandes 2004-03-18