next up previous contents index
Next: 8 Unstructured Communication Up: ADAPTOR HPF Programmers Guide Previous: 6 Global Communications   Contents   Index

Subsections


7 Structured Communication

Structured communication will be generated for all data parallel statements where every processor can compute by its own the corresponding schedule for the communication. The schedule specifies which data has to be sent to other processors and which data has to be received.

7.1 Assignments with Regular Sections

A regular section of a mapped array can be assigned to another regular section of any other mapped array. If this assignment needs communication, this will be usually very fast.

      real, dimension (N, N) :: A, A1
!hpf$ distribute (*,block) :: A, A1
      integer K
      ...
      A(1:N,1) = A1(1:N,K)     ! fast communication

For the multiprocessing execution model, the above statement results in a communication where the owner of the K-th column sends this column to the owner of the first column. There is no need for communication if the both owners are the same but the HPF compiler generates still a certain amout of overhead to verify this locality at runtime.

For the following two array assignments, ADAPTOR generates communication to send and receive the non-local data.

      A(1:N-1,1:N-1) = A1(2:N,2:N)      ! fast communication
      A(3:N,1:N-2)   = A1(1:N-2,3:N)    ! fast communication

The following examples show that this kind of assignment can also be used to replicate data.

      real, dimension (N,N) :: A, RA
      real, dimension (N)   :: RA1 
!hpf$ distribute (cyclic,block) :: A
!adp$ replicated :: RA1                ! is default
!adp$ replicated :: RA                 ! is default
      integer k

      RA = A          ! replication of distributed array A          
      RA1 = A(K,1:N)  ! replication of the k-th row of A
      RA1 = A(1:N,K)  ! replication of the k-th column of A

Furthermore, such an assignment can imply the redistribution of a whole array.

      integer N1, N2, N3, N4, N5, N6
      parameter (N1=7, N2=9, N3=12, N4=5, N5=5, N6=4)
      real, dimension (N1, N2, N3, N4, N5, N6) :: A, B
!hpf$ distribute A (block,*,block,*,*,*)
!hpf$ distribute B (block,CYCLIC,*,*,*,CYCLIC)
      ...
      A = B      ! redistribution of an entire array
      ...
      B(5,2:4,3:7,4,:,:) = A(4,2:4,2:6,5,:,:)  ! same for subsections

Depending on the mapping of the arrays, the assignment can imply completely different communication patterns.

      real, dimension (N) :: A, A1, A2
!hpf$ distribute (block) :: A
!adp$ single :: A1              ! mapped to master processor
!adp$ replicated :: A2          ! mapped to all processors
      ...
      A = A1                    ! master sends to all other processors
      A1 = A                    ! master receives from all other processors
      A = A2                    ! no communication 
      A2 = A                    ! all processors broadcast their part of A
      A1 = A2                   ! no communication
      A2 = A1                   ! broadcast from master processor

E.g. in the above code, the assignment A1 = A gathers data from all processors to one processor. In an equivalent MPI implementation the MPI routine MPI_GATHER would be used.

7.2 FORALL Statements with Structured Communication

A FORALL statement might require communication between the available processors.

      real, dimension (N) :: A, B
!hpf$ distribute (block) :: A, B
      ...
      forall (I=2:N-1)
         A(I) = (B(I+1) + B(I-1) + 2. * B(I)) * .25
      end forall


7.3 Shifting

ADAPTOR supports the intrinsic function CSHIFT that might generate structured communication in the same way as array assignments.

      real, dimension (M,N) :: A, B
!hpf$ distribute (*,block) :: A, B
      ...
      B = cshift (A,dim=1,1)        ! no communication
      B = cshift (A,dim=2,-1)       ! efficient communication

Also the CSHIFT function takes advantage of shadow edges if it is possible.

The intrinsic function EOSHIFT is supported in the same way.


7.4 Transpose

The intrinsic function TRANSPOSE causes no problems at all and will be handled in the same efficient way if communication is involved.

      real, dimension (M,N) :: A
      real, dimension (N,M) :: B
!hpf$ distribute (*,block) :: A, B
      ...
      A = TRANSPOSE (B)
      A(2:M,1:N-1) = TRANSPOSE (B(2:N,1:M-1))

7.5 Matrix Multiplication

ADAPTOR will use a parallel implementation of the matrix multiplication for the array intrinsic function MATMUL.


next up previous contents index
Next: 8 Unstructured Communication Up: ADAPTOR HPF Programmers Guide Previous: 6 Global Communications   Contents   Index
Thomas Brandes 2004-03-18