next up previous contents index
Next: 11 Pitfalls Up: ADAPTOR HPF Programmers Guide Previous: 9 Shadow Edges and   Contents   Index

Subsections

10 Explicit and Implicit Remapping

10.1 About the Importance of Remapping

The following code contains two loop nests. In the first loop nest, every row can be computed independently, In the second loop nest, every column can be computed independently.

      real, dimension (N,N) :: A, B
!hpf$ distribute (block,*) :: A, B 
      do J = 2, N
         forall (I=1:N)
     &     A(I,J) = A(I,J) - A(I,J-1) * B(I,J)
      end do
      do I = 2, N
         forall (J=1:N)
     &     A(I,J) = A(I,J) - A(I-1,J) * B(I,J)
      end do

In the DO J loop, the FORALL statement is local and requires no communication. The DO I loop contains also a FORALL statement, but it will be executed serially as it is related to a serial dimension.

If we transpose the arrays A and B before the DO I loop (see figure 7, then this loop can be executed exactly as the DO J loop.

Figure 7: Redistribution of arrays
\includegraphics[height=45mm]{transpose.eps}

Generally speaking, the remapping of arrays can reduce the communication in data parallel computations.

10.2 Explicit Remapping via Array Assignments

The most general solution for remapping that should work with nearly every HPF compiler is the introduction of different arrays with different mappings.

      real, dimension (N,N) :: A, B, A1, B1
!hpf$ distribute (block,*) :: A, B
!hpf$ distribute (*,block) :: A1, B1
      ...
      do J = 2, N
         forall (I=1:N)
     &     A(I,J) = A(I,J) - A(I,J-1) * B(I,J)
      end do
      A1 = A; B1 = B      ! redistribution
      do I = 2, N
         forall (J=1:N)
     &     A1(I,J) = A1(I,J) - A1(I-1,J) * B1(I,J)
      end do
      A = A1; B = B1      ! redistribution

10.3 Explicit Remapping via Remapping Directives

HPF allows for explicit remapping of data via special directives.

      real, dimension (N,N) :: A, B
      ...
!hpf$ redistribute (block,*) :: A, B
      do J = 2, N
         forall (I=1:N)
     &     A(I,J) = A(I,J) - A(I,J-1) * B(I,J)
      end do
!hpf$ redistribute (*,block) :: A, B
      do I = 2, N
         forall (J=1:N)
     &     A(I,J) = A(I,J) - A(I-1,J) * B(I,J)
      end do

This solution offers certain advantages:

10.4 Implicit Remapping at Subroutine Boundaries

10.4.1 Remapping in the Called Routine

With the exception of local routines, serial routines and pure procedures, every subprogram will check the distributions of its dummy arguments and make some redistribution if it is necessary.

Attention: Also descriptive directives will be handled like prescriptive ones. Inherited distributions are not supported.

As the subroutine is responsible for the redistribution, the user can take advantage of the INTENT attribute. It can avoid the copy in or copy out of data in case of a redistribution.

10.4.2 Remapping in the Calling Routines

In the following it will be explained in which situations it is useful, it might be useful and it is necessary to have an interface block.

The most general rule is that any full array or any section of an array is just passed by a descriptor and the called subroutine is responsible for a redistribution. The calling routine has not to do anything and therefore no interface block is necessary.

   call SUB (A(1:N,1:N), B)

The next general rule is that an interface block must be available if the subroutine will not apply redistributions and cannot deal with the actual distribution. The following reasons are possible that a subroutine cannot apply redistributions:

The last rule is that an interface should be specified if the calling routine should make the redistribution for optimization issues. This can be the case if

Note: About a redistribution within the called routine will be decided at runtime. If the interface block is available in the calling routine, the redistribution will be decided at compile time. In some situations, it can only be determined at runtime that no redistribution is necessary, e.g. on one processor the block and cyclic distribution is the same. Then at least the local copying of data is not necessary.


10.5 Data Transfers with the ON Directive

If a statement or a block should be executed by a processor subset, the compiler must make sure that all data is mapped onto the corresponding active processors. This data transfer can involve other processors that are not part of the active processors.

The compiler will create temporary data and inserting copy-in and copy-out communication for non-local data. Any replicated data changed on a processor subset has to be made consistent afterwards. All processors that were not active must get copies of the new values.


next up previous contents index
Next: 11 Pitfalls Up: ADAPTOR HPF Programmers Guide Previous: 9 Shadow Edges and   Contents   Index
Thomas Brandes 2004-03-18