next up previous contents index
Next: F. Layout of Data Up: ADAPTOR HPF Language Reference Previous: D. ADAPTOR Problems and   Contents   Index

Subsections


E. Mapping of Data

The central idea of the data-parallel programming model is the mapping of the data (array elements) and the corresponding work on the data to the processors of the parallel machine. This section describes the HPF directives and ADAPTOR specific directives supported by ADAPTOR that define the mapping of data to the processors. Though there are some explicit features to map work and computations to processors, the data mapping implicitly defines by the ''owner computes rule`` how the work on the data is mapped to the processors. Therefore the definition of the mapping of data is crucial for the performance of data parallel programs. The mapping directives make also sure that related data is mapped to the same processor and so data locality is exploited.

Compared with the HPF standard, the following characteristics of ADAPTOR should be pointed out:

ADAPTOR also supports the approved extensions of HPF 2.0 for dynamic remapping of data. These directives are discussed in Section 13.1.

E..1 Overview of Data Mapping

High Performance Fortran provides a two-level mapping of data objects to memory regions, referred to as ''abstract processors''. Data objects (typically array elements) are first aligned relative to one another. This group of arrays is then distributed onto a rectilinear arrangement of abstract processors. Furthermore, there is a machine-dependent mapping of abstract processors to physical processors. Figure 1 illustrates the model.

Figure 1: Directives of High Performance Fortran
2#2

E..2 Distribution of Arrays

A distribute directive partitions an object between processors. The user can specify the dimensions of the object (array or template) which should be distributed. A distribution guarantees that one processor owns only a part of the original array. A node processor needs only memory for this local part.

E..2.1 One-Dimensional Distributions

A one-dimensional array with $N$ elements can be distributed onto $P$ processors in the following way:

Figure 2: Block distribution of a one-dimensional array onto 4 processors
\includegraphics[height=48mm]{dist1_block.eps}

Figure 3: Cyclic distribution of a one-dimensional array onto 4 processors
\includegraphics[height=31mm]{dist1_cyclic.eps}

Figure 4: Arbitrary distribution of a one-dimensional array onto 3 processors
\includegraphics[height=13mm]{dist1_arbi.eps}

When a dimension is distributed, only the local part of it will be allocated on a single processor unless a shadow edge is specified. General block and indirect distributions are approved extensions of HPF 2.0. Arbitrary distributions have been used by Moreira et al. [MEKN96], they are not standardized, but can be considered as a mixture of general block and indirect distributions.

E..2.2 Multi-Dimensional Distributions

Multi-dimensional arrays can also be distributed on one-dimensional processor arrays. Only one dimension of the array will be distributed while the other dimensions become serial (see Figure 5).

Figure 5: Distributions of a two-dimensional array with a serial dimension
\includegraphics[height=40mm]{dist2a.eps}

In case of multi-dimensional arrays it is possible to distribute them onto multi-dimensional processor arrays. For the distributed dimensions, the user can specify different distributions (see Figure 6).

Figure 6: Distributions of a two-dimensional array onto 4 processors.
\includegraphics[height=40mm]{dist2b.eps}

In the current version of ADAPTOR up to four dimensions can be distributed.

      real, dimension (N,N,N,N) :: C
!hpf$ distribute C(block, block, *, cyclic)


E..3 Abstract Processor Arrays and Processor Subsets

Arrays are distributed onto a rectilinear arrangement of abstract processors (also called torus). The following commands will define a one-dimensional torus with 4 processors, one with 8 processors and a two-dimensional torus with 4 processors.

!hpf$ processors P1 (4)
!hpf$ processors P2 (8)
!hpf$ processors P3 (2,2)

It is also possible to specify a scalar processor array that will consist of one single processor.

!hpf$ processors P0

The definition of processor arrays is like the definition of automatic arrays (see section A.2.2). The size can be specified with values that will only be known at runtime.

      subroutine SUB (N1, N2)
      integer N1, N2
!hpf$ processors TORUS1 (N1, N2)
!hpf$ processors TORUS2 (number_of_processors()/N1,N1)

Processor arrays are mainly used in the ONTO cause of the DISTRIBUTE directive. They can also appear in the ON directive to specify the home of computations.

      real, dimension (M, N) :: A0, A1, A2, A3
!hpf$ distribute A0 (*,*) onto P0
!hpf$ distribute A1 (*,block) onto P1
!hpf$ distribute A2 (*,block) onto P2
!hpf$ distribute A3 (cyclic,block) onto P3

Arrays can also be mapped to properly specified processor subsets. This is very useful in conjunction with the tasking construct, see Section 10.1.

!hpf$ processors P (10)
      real, dimension (N, N) :: A1, A2
!hpf$ distribute A1 (*,block) onto P(1:5)
!hpf$ distribute A2 (*,block) onto P(6:10)

In contrary to the HPF 2.0 standard, ADAPTOR allows vector-subscripts for processor subsets. This gives more flexibility in mapping data to certain processors.

!hpf$ processors P (6)
      integer, dimension (4) :: IND = (/ 1, 3, 4, 6 /)
      real, dimension (N)    :: A1, A2
!hpf$ distribute A1 (block) onto P
!hpf$ distribute A2 (block) onto P(IND)

Figure 7: Irregular Processor Subset
\includegraphics[height=16mm]{psubset.eps}

E..4 Alignment of Arrays

ADAPTOR supports the alignment directives of HPF. In the following some examples of using alignment directives are given.

E..4.1 Alignment to a Template

The first example shows a typical use of a template. If the distribution of the template is changed (only one declaration), the distribution of all aligned arrays will be changed.

!hpf$ template T (NQPX,NQPX,NKPX)
!hpf$ distribute T (*,*,block)

      double precision, dimension (5,NQPX,NQPX,NKPX) :: UO, U, UP
      double precision, dimension (5,NQPX,NQPX,NKPX) :: FU, GU, HU, KU
      double precision, dimension   (NQPX,NQPX,NKPX) :: TMP, TMP1

!hpf$ align (*,:,:,:) with T(:,:,:) :: UO, U, UP
!hpf$ align (*,i,j,k) with T(i,j,k) :: FU, GU, HU, KU
!hpf$ align (:,:,:) with T(:,:,:) :: TMP
!hpf$ align with t :: TMP1

It is recommended to use align-dummies in the alignment directive. The use of '':`` can cause problems for allocatable arrays.

      integer, parameter :: N = 100
      real, dimension (:), allocatable :: A, B
!hpf$ distribute A(block)
!hpf$ align B(i) with A(i-1)
!     the same as align B(:) with A(:)
      allocate (A(1:N), B(2:N))

Attention: The HPF standard and also ADAPTOR requires that the target of an alignment must have been allocated before the alignee.

E..4.2 Serial Dimensions

Looking at the source of the alignment, a source dimension can be mapped to a dimension of the target or not. In the latter case, the dimension of the source array is collapsed, it becomes a serial dimension.

Figure 8: Serial/collapsed dimensions.
!hpf$ template T(N)
      real A(N,N), B(N,N)
!hpf$ align A(I,J) with T(J)
!hpf$ align B(I,J) with T(I)
\includegraphics[height=40mm]{align1.eps}

A collapsed dimension is the same as aligning it to a serial template dimension.

!hpf$ template T (N)                    !hpf$ template T(N,N)
      real, dimension (N,N) :: A              real, dimension (N,N) :: A
!hpf$ align A(*,J) with T(J)            !hpf$ align A(I,J) with T(I,J)
!hpf$ distribute T(block)               !hpf$ distribute T(*,block)

Serial dimensions are very important as the compiler will already know at compile time that no communication will be involved if the accesses have only different values in the serial dimensions.

!hpf$ template T (N)
      real, dimension (N,N) :: A, B
!hpf$ align A(*,I) with T(I)
!hpf$ align B(I,*) with T(I)

      A(I,K) = A(J,K)    ! no communication
      B(I,K) = B(I,J)    ! no communication
      B(1,3) = A(5,1)    ! no communication


E..4.3 Permutations

The HPF alignment directive allows also to interchange dimensions between alignee and align-target.

!hpf$ template T3 (N,N,N)
      real A3(N,N,N)                      ! permutation of dimensions
!hpf$ align A3(I,J,K) with T3(K,I,J)

This kind of mapping is full supported by ADAPTOR and should not cause any problems.

E..4.4 Linear Embeddings

An alignment represents an embedding if the specified mapping is an injective mapping.

The embedding can be given by an injective mapping for every dimension.

!hpf$ template T2 (N,N)
      real A2 (N2,N2)                     ! embedding
!hpf$ align A2(I,J) with T2(2*I,2*J-1)

!hpf$ template T2 (0:N+1,0:N+1)
      real A2 (N,N)                     ! embedding
!hpf$ align A2(I,J) with T2(I,J)

Figure 9: Linear embedding of an array into a template.
\includegraphics[height=50mm]{align2.eps}


E..4.5 Embedded Arrays

If the alignee has less dimensions than the align-target, the array can be mapped to certain positions in the align-target.

Figure 10: Embedding of an array into a template.
!hpf$ template T(N,N)
      real A(N), B(N)
!hpf$ align A(I) with T(I,3)
!hpf$ align B(I) with T(3,I)
\includegraphics[height=40mm]{align_emb.eps}

      integer, parameter :: N = 100, M = 50
      real, dimension (M,N) :: A
      real, dimension (M)   :: T1, TN
!hpf$ distribute A(block,block)
!hpf$ align T1(i) with A(i,1)
!hpf$ align TN(i) with A(i,N)
      ...
      TN(1:M) = A(1:M,N)      ! no communication
      A(1:M,1) = T1(1:M)      ! no communication


E..4.6 Replication of Arrays

The ALIGN directive can also be used to replicate arrays onto the available processors. By this way, data can have more than one owning processor.

Figure 11: Replication of arrays via the ALIGN directive.
!hpf$ template T(N,N)
      real A(N), B(N)
!hpf$ align A(I) with T(I,*)
!hpf$ align B(I) with T(*,I)
\includegraphics[height=40mm]{align_rep.eps}

The following mapping (see also Figure 11) specifies replication for certain dimensions of the align target. If the corresponding dimension is distributed onto processors, all the processors are owner and will allocate their own incarnation.

!hpf$ template T (N,N)
      real, dimension (N) :: A, A1, A2
!hpf$ align A     with T(*,*)           ! replication along both dimensions
!hpf$ align A1(I) with T(I,*)           ! replication along second dimension
!hpf$ align A2(I) with T(*,I)           ! replication along first dimension

The replication of data guarantees that the corresponding processors will have a copy of the data.


E..5 Replication and Mapping of Serial Data

ADAPTOR supports two new compiler-specific directives, the REPLICATED and the SINGLE directive that allow the mapping of scalar variables and serial arrays. Both directives can be used in the following ways:

!adp$ replicated <id> [onto <processor-spec>]
!adp$ replicated [onto <processor-spec>] :: <id_list>
!adp$ single <id> [onto <processor-spec>]
!adp$ single [onto <processor-spec>] :: <id_list>

Though HPF directives can also be used for the same mapping, these new directives avoid the restrictions implicitly given due to the rank of the abstract processor arrays.

The REPLICATED directive allows to replicate scalar variables or serial arrays onto a set of processors in such a way that every processor becomes owner of the full data. A missing ONTO clause implies the mapping to all (active) processors.

!hpf$ processors P(5)
      real, dimension (M,N) :: B
      real :: X
!adp$ replicated onto P :: B, X

Every processor of the processor array P becomes an owner of the variables X and B. Using HPF directives requires the introduction of a template that might not be very convenient.

      real, dimension (M,M) :: B
      real :: X
!hpf$ processors P(5)

!adp$ replicated onto P :: X, B    

!hpf$ template T(5)
!hpf$ distribute T (block) onto P
!hpf$ align B (*,*) with T(*)
!hpf$ align X with T(*)

On the other hand, serial data can also be mapped in such a way that only the first processor (host) is the owner of the data.

!hpf$ processors P(5)
      real, dimension (M,N) :: B, C
      real :: X
!adp$ single onto P(3):: B
!adp$ single :: C, X

The abstract processor array in the ONTO clause must be scalar. A missing ONTO clause implies that only the first of all (active) processors becomes an owner.

      real, dimension (M,N) :: B, C
      real :: X
!hpf$ processors P(5)
!hpf$ distribute B(*,*) onto P(3)
!hpf$ distribute C(*,*)
!hpf$ align X with C(*,*)


E..6 Default Mappings

There are some rules that define a default mapping for arrays. A default mapping is in this sense a full specified mapping as the compiler will not chose the mapping but apply certain rules.

E..6.1 Defaults for Implicit Mappings

If an array has not an explicit mapping, a default distribution will be assumed for this array. At the moment the user can choose between two possibilities:

In the following situations, an array will not be distributed by default:

E..6.2 Missing ONTO Clauses and Default Processor Arrays

A missing ONTO clause implies a default arrangement. But this arrangement is identical for distributees that have identical shapes and identical explicit mappings.

      REAL A(N,N), B(N)
!hpf$ distribute A(block,block)
!hpf$ distribute B(block)

By default there is an abstract processor array defined for every possible rank. While the scalar processor array only contains one single processor, also called the host processor, the size of all default processor arrays with a rank N, $N \ge 1$, is always identical to the number of available processors. The shape of the processor array is chosen in such a way that the number of processors is nearly identical for every dimension.

Table 1 shows some examples for initial abstract processor arrays.


Table 1: Default abstract processor arrays in ADAPTOR.
NP Torus 1 Torus 2 Torus 3
5 5 1 x 5 1 x 1 x 5
6 6 2 x 3 1 x 2 x 3
8 8 2 x 4 2 x 2 x 2
12 12 3 x 4 2 x 2 x 3
16 16 4 x 4 2 x 2 x 4
32 32 4 x 8 2 x 4 x 4
33 33 3 x 11 1 x 3 x 11


If in a distribute directive the ONTO-clause is not specified, the distribution will be onto the default processor array whose rank is given by the number of distributed dimensions.

Attention: A missing ONTO-clause implies not an underspecified mapping. Only ONTO * stands for an underspecified mapping.

E..6.3 Missing Distribution Format

If in a distribute directive no distribution formats are specified, then the compiler will chose block distributions for the last $n$ dimensions where n is the rank of the processor array.

!hpf$ processors P(3)
      real, dimension (N, N) :: A
!hpf$ distribute onto P :: A

For this example, ADAPTOR takes the following distribution:

!hpf$ distribute (*,BLOCK) onto P :: a

Attention: It is an error to specify a processor arrangement that has a greater rank than the rank of the distributed object.

It should also be observed that a missing distribution format is different to an unspecified distribution format.

!hpf$ processors P(3)
      real, dimension (N, N) :: A, B
!hpf$ distribute onto P   :: A
!hpf$ distribute * onto P :: A       !  underspecified mapping


next up previous contents index
Next: F. Layout of Data Up: ADAPTOR HPF Language Reference Previous: D. ADAPTOR Problems and   Contents   Index
Thomas Brandes 2004-03-18