Next: F. Layout of Data Up: ADAPTOR HPF Language Reference Previous: D. ADAPTOR Problems and Contents Index

Subsections

E. Mapping of Data

The central idea of the data-parallel programming model is the mapping of the data (array elements) and the corresponding work on the data to the processors of the parallel machine. This section describes the HPF directives and ADAPTOR specific directives supported by ADAPTOR that define the mapping of data to the processors. Though there are some explicit features to map work and computations to processors, the data mapping implicitly defines by the ''owner computes rule`` how the work on the data is mapped to the processors. Therefore the definition of the mapping of data is crucial for the performance of data parallel programs. The mapping directives make also sure that related data is mapped to the same processor and so data locality is exploited.

Compared with the HPF standard, the following characteristics of ADAPTOR should be pointed out:

ADAPTOR does not take the mapping directives only as an advise. ADAPTOR will respect all mapping directives.
In the first place, the mapping directives define a mapping of data to the processors and not of data objects to processor memories. This is nearly the same if ADAPTOR compiles a HPF program for a distributed memory system, but might not be appropriate for a shared memory system. In ADAPTOR, there is a strict separation between the mapping of data as described in this section and about the allocation of data in processor memories. The latter topic will be discussed in the next section (Section 6).
The SEQUENCE and SHADOW directives of HPF are in this sense no mapping directives but layout directives. Storage and sequence association are not related to the mapping of data but to its layout.
ADAPTOR provides some specific directives to map objects to a specific processor or to repliate it among a set of processors.
ADAPTOR makes a clear distinction between full specified mappings, default mappings, and underspecified mappings. A full specified mapping is given if the mapping is exactly defined by the mapping directives and there is not any choice for the compiler about the actual mapping. A default mapping is given if there is no mapping directive at all for an object or if there are certain items missed in the DISTRIBUTE directive (see Section 5.6). ADAPTOR applies certain rules that imply also for a default mapping a certain mapping that is like a full specified mapping. An underspecified mapping is given if the mapping directives do not imply one mapping but a set of possible mappings (see Section 7). In this case, the compiler choses the mapping (for local objects) or the compiler has to deal with all possible mappings (for dummy objects and for objects with dynamic mapping).

ADAPTOR also supports the approved extensions of HPF 2.0 for dynamic remapping of data. These directives are discussed in Section 13.1.

E..1 Overview of Data Mapping

High Performance Fortran provides a two-level mapping of data objects to memory regions, referred to as ''abstract processors''. Data objects (typically array elements) are first aligned relative to one another. This group of arrays is then distributed onto a rectilinear arrangement of abstract processors. Furthermore, there is a machine-dependent mapping of abstract processors to physical processors. Figure 1 illustrates the model.

**Figure 1:** Directives of High Performance Fortran
2#2

E..2 Distribution of Arrays

A distribute directive partitions an object between processors. The user can specify the dimensions of the object (array or template) which should be distributed. A distribution guarantees that one processor owns only a part of the original array. A node processor needs only memory for this local part.

E..2.1 One-Dimensional Distributions

A one-dimensional array with elements can be distributed onto processors in the following way:

block-distributed (see Figure 2) where one processor contains a contiguous piece of size or of an explicitly specified size .
```
      real, dimension (N) :: A
!hpf$ distribute A(block)
!hpf$ distribute A(block(B))
```
cyclic distributed (see figure 3) where every Pth element is on the same processor or very Pth block of size is on the same processor.
```
      real, dimension (N) :: A
!hpf$ distribute A(cyclic)
!hpf$ distribute A(cyclic(B))
```
general block distributed, With the usual block distribution, every processor gets the same block size. The general block distribution allows to specify a certain size for every processor.
```
      integer, dimension(4) :: SIZE = (/10,16,16,10/)
      real, dimension (52)  :: A
!hpf$ distribute A(gen_block(SIZE))
```

indirectly distributed, where the actual mapping to the processors is specified by an integer array (map array).

      integer, dimension(52) :: MAP = (/1,3,4,1,...,3,2,4,1/)
      real, dimension (52)   :: A
!hpf$ distribute A(indirect(MAP))

arbitrarily distributed, where the actual mapping is also specified by a map array, but instead for single elements one entry is given for a chunk of elements (see Figure 4).
It is specified as arbitrary (N,LENGTH,MAP) where N is an integer scalar and LENGTH and MAP are integer vectors of length (at least) N. It causes the axis to be divided into N blocks, numbered . The length of block is and it is mapped to processor $MAP(i), 1 \le MAP(i) \le P$ . The length of the blocks must sum up to length of the axis.
```
      integer, dimension (6) :: LENGTH = (/10,8,8,12,6,6/)
      integer, dimension (6) :: MAP = (/1,2,3,2,3,1/)
      real, dimension (52)  :: A
!hpf$ distribute A (arbitrary(6,LENGTH,MAP))
```

**Figure 2:** Block distribution of a one-dimensional array onto 4 processors
$\includegraphics[height=48mm]{dist1_block.eps}$

**Figure 3:** Cyclic distribution of a one-dimensional array onto 4 processors
$\includegraphics[height=31mm]{dist1_cyclic.eps}$

**Figure 4:** Arbitrary distribution of a one-dimensional array onto 3 processors
$\includegraphics[height=13mm]{dist1_arbi.eps}$

When a dimension is distributed, only the local part of it will be allocated on a single processor unless a shadow edge is specified. General block and indirect distributions are approved extensions of HPF 2.0. Arbitrary distributions have been used by Moreira et al. [MEKN96], they are not standardized, but can be considered as a mixture of general block and indirect distributions.

E..2.2 Multi-Dimensional Distributions

Multi-dimensional arrays can also be distributed on one-dimensional processor arrays. Only one dimension of the array will be distributed while the other dimensions become serial (see Figure 5).

**Figure 5:** Distributions of a two-dimensional array with a serial dimension
$\includegraphics[height=40mm]{dist2a.eps}$

In case of multi-dimensional arrays it is possible to distribute them onto multi-dimensional processor arrays. For the distributed dimensions, the user can specify different distributions (see Figure 6).

**Figure 6:** Distributions of a two-dimensional array onto 4 processors.
$\includegraphics[height=40mm]{dist2b.eps}$

In the current version of ADAPTOR up to four dimensions can be distributed.

      real, dimension (N,N,N,N) :: C
!hpf$ distribute C(block, block, *, cyclic)

E..3 Abstract Processor Arrays and Processor Subsets

Arrays are distributed onto a rectilinear arrangement of abstract processors (also called torus). The following commands will define a one-dimensional torus with 4 processors, one with 8 processors and a two-dimensional torus with 4 processors.

!hpf$ processors P1 (4)
!hpf$ processors P2 (8)
!hpf$ processors P3 (2,2)

It is also possible to specify a scalar processor array that will consist of one single processor.

!hpf$ processors P0

The definition of processor arrays is like the definition of automatic arrays (see section A.2.2). The size can be specified with values that will only be known at runtime.

      subroutine SUB (N1, N2)
      integer N1, N2
!hpf$ processors TORUS1 (N1, N2)
!hpf$ processors TORUS2 (number_of_processors()/N1,N1)

Processor arrays are mainly used in the ONTO cause of the DISTRIBUTE directive. They can also appear in the ON directive to specify the home of computations.

      real, dimension (M, N) :: A0, A1, A2, A3
!hpf$ distribute A0 (*,*) onto P0
!hpf$ distribute A1 (*,block) onto P1
!hpf$ distribute A2 (*,block) onto P2
!hpf$ distribute A3 (cyclic,block) onto P3

Arrays can also be mapped to properly specified processor subsets. This is very useful in conjunction with the tasking construct, see Section 10.1.

!hpf$ processors P (10)
      real, dimension (N, N) :: A1, A2
!hpf$ distribute A1 (*,block) onto P(1:5)
!hpf$ distribute A2 (*,block) onto P(6:10)

In contrary to the HPF 2.0 standard, ADAPTOR allows vector-subscripts for processor subsets. This gives more flexibility in mapping data to certain processors.

!hpf$ processors P (6)
      integer, dimension (4) :: IND = (/ 1, 3, 4, 6 /)
      real, dimension (N)    :: A1, A2
!hpf$ distribute A1 (block) onto P
!hpf$ distribute A2 (block) onto P(IND)

**Figure 7:** Irregular Processor Subset
$\includegraphics[height=16mm]{psubset.eps}$

E..4 Alignment of Arrays

ADAPTOR supports the alignment directives of HPF. In the following some examples of using alignment directives are given.

E..4.1 Alignment to a Template

The first example shows a typical use of a template. If the distribution of the template is changed (only one declaration), the distribution of all aligned arrays will be changed.

!hpf$ template T (NQPX,NQPX,NKPX)
!hpf$ distribute T (*,*,block)

      double precision, dimension (5,NQPX,NQPX,NKPX) :: UO, U, UP
      double precision, dimension (5,NQPX,NQPX,NKPX) :: FU, GU, HU, KU
      double precision, dimension   (NQPX,NQPX,NKPX) :: TMP, TMP1

!hpf$ align (*,:,:,:) with T(:,:,:) :: UO, U, UP
!hpf$ align (*,i,j,k) with T(i,j,k) :: FU, GU, HU, KU
!hpf$ align (:,:,:) with T(:,:,:) :: TMP
!hpf$ align with t :: TMP1

It is recommended to use align-dummies in the alignment directive. The use of '':`` can cause problems for allocatable arrays.

      integer, parameter :: N = 100
      real, dimension (:), allocatable :: A, B
!hpf$ distribute A(block)
!hpf$ align B(i) with A(i-1)
!     the same as align B(:) with A(:)
      allocate (A(1:N), B(2:N))

Attention: The HPF standard and also ADAPTOR requires that the target of an alignment must have been allocated before the alignee.

E..4.2 Serial Dimensions

Looking at the source of the alignment, a source dimension can be mapped to a dimension of the target or not. In the latter case, the dimension of the source array is collapsed, it becomes a serial dimension.

Figure 8: Serial/collapsed dimensions.

!hpf$ template T(N)
      real A(N,N), B(N,N)
!hpf$ align A(I,J) with T(J)
!hpf$ align B(I,J) with T(I)

$\includegraphics[height=40mm]{align1.eps}$

A collapsed dimension is the same as aligning it to a serial template dimension.

!hpf$ template T (N)                    !hpf$ template T(N,N)
      real, dimension (N,N) :: A              real, dimension (N,N) :: A
!hpf$ align A(*,J) with T(J)            !hpf$ align A(I,J) with T(I,J)
!hpf$ distribute T(block)               !hpf$ distribute T(*,block)

Serial dimensions are very important as the compiler will already know at compile time that no communication will be involved if the accesses have only different values in the serial dimensions.

!hpf$ template T (N)
      real, dimension (N,N) :: A, B
!hpf$ align A(*,I) with T(I)
!hpf$ align B(I,*) with T(I)

      A(I,K) = A(J,K)    ! no communication
      B(I,K) = B(I,J)    ! no communication
      B(1,3) = A(5,1)    ! no communication

E..4.3 Permutations

The HPF alignment directive allows also to interchange dimensions between alignee and align-target.

!hpf$ template T3 (N,N,N)
      real A3(N,N,N)                      ! permutation of dimensions
!hpf$ align A3(I,J,K) with T3(K,I,J)

This kind of mapping is full supported by ADAPTOR and should not cause any problems.

E..4.4 Linear Embeddings

An alignment represents an embedding if the specified mapping is an injective mapping.

The embedding can be given by an injective mapping for every dimension.

!hpf$ template T2 (N,N)
      real A2 (N2,N2)                     ! embedding
!hpf$ align A2(I,J) with T2(2*I,2*J-1)

!hpf$ template T2 (0:N+1,0:N+1)
      real A2 (N,N)                     ! embedding
!hpf$ align A2(I,J) with T2(I,J)

**Figure 9:** Linear embedding of an array into a template.
$\includegraphics[height=50mm]{align2.eps}$

E..4.5 Embedded Arrays

If the alignee has less dimensions than the align-target, the array can be mapped to certain positions in the align-target.

Figure 10: Embedding of an array into a template.

!hpf$ template T(N,N)
      real A(N), B(N)
!hpf$ align A(I) with T(I,3)
!hpf$ align B(I) with T(3,I)

$\includegraphics[height=40mm]{align_emb.eps}$

      integer, parameter :: N = 100, M = 50
      real, dimension (M,N) :: A
      real, dimension (M)   :: T1, TN
!hpf$ distribute A(block,block)
!hpf$ align T1(i) with A(i,1)
!hpf$ align TN(i) with A(i,N)
      ...
      TN(1:M) = A(1:M,N)      ! no communication
      A(1:M,1) = T1(1:M)      ! no communication

E..4.6 Replication of Arrays

The ALIGN directive can also be used to replicate arrays onto the available processors. By this way, data can have more than one owning processor.

Figure 11: Replication of arrays via the ALIGN directive.

!hpf$ template T(N,N)
      real A(N), B(N)
!hpf$ align A(I) with T(I,*)
!hpf$ align B(I) with T(*,I)

$\includegraphics[height=40mm]{align_rep.eps}$

The following mapping (see also Figure 11) specifies replication for certain dimensions of the align target. If the corresponding dimension is distributed onto processors, all the processors are owner and will allocate their own incarnation.

!hpf$ template T (N,N)
      real, dimension (N) :: A, A1, A2
!hpf$ align A     with T(*,*)           ! replication along both dimensions
!hpf$ align A1(I) with T(I,*)           ! replication along second dimension
!hpf$ align A2(I) with T(*,I)           ! replication along first dimension

The replication of data guarantees that the corresponding processors will have a copy of the data.

E..5 Replication and Mapping of Serial Data

ADAPTOR supports two new compiler-specific directives, the REPLICATED and the SINGLE directive that allow the mapping of scalar variables and serial arrays. Both directives can be used in the following ways:

!adp$ replicated <id> [onto <processor-spec>]
!adp$ replicated [onto <processor-spec>] :: <id_list>
!adp$ single <id> [onto <processor-spec>]
!adp$ single [onto <processor-spec>] :: <id_list>

Though HPF directives can also be used for the same mapping, these new directives avoid the restrictions implicitly given due to the rank of the abstract processor arrays.

The REPLICATED directive allows to replicate scalar variables or serial arrays onto a set of processors in such a way that every processor becomes owner of the full data. A missing ONTO clause implies the mapping to all (active) processors.

!hpf$ processors P(5)
      real, dimension (M,N) :: B
      real :: X
!adp$ replicated onto P :: B, X

Every processor of the processor array P becomes an owner of the variables X and B. Using HPF directives requires the introduction of a template that might not be very convenient.

      real, dimension (M,M) :: B
      real :: X
!hpf$ processors P(5)

!adp$ replicated onto P :: X, B    

!hpf$ template T(5)
!hpf$ distribute T (block) onto P
!hpf$ align B (*,*) with T(*)
!hpf$ align X with T(*)

On the other hand, serial data can also be mapped in such a way that only the first processor (host) is the owner of the data.

!hpf$ processors P(5)
      real, dimension (M,N) :: B, C
      real :: X
!adp$ single onto P(3):: B
!adp$ single :: C, X

The abstract processor array in the ONTO clause must be scalar. A missing ONTO clause implies that only the first of all (active) processors becomes an owner.

      real, dimension (M,N) :: B, C
      real :: X
!hpf$ processors P(5)
!hpf$ distribute B(*,*) onto P(3)
!hpf$ distribute C(*,*)
!hpf$ align X with C(*,*)

E..6 Default Mappings

There are some rules that define a default mapping for arrays. A default mapping is in this sense a full specified mapping as the compiler will not chose the mapping but apply certain rules.

E..6.1 Defaults for Implicit Mappings

If an array has not an explicit mapping, a default distribution will be assumed for this array. At the moment the user can choose between two possibilities:

every array is replicated by default,
or every array is distributed in the last dimensions by default (with up to three distributed dimensions).

In the following situations, an array will not be distributed by default:

character arrays,
arrays in derived data types.

E..6.2 Missing ONTO Clauses and Default Processor Arrays

A missing ONTO clause implies a default arrangement. But this arrangement is identical for distributees that have identical shapes and identical explicit mappings.

      REAL A(N,N), B(N)
!hpf$ distribute A(block,block)
!hpf$ distribute B(block)

By default there is an abstract processor array defined for every possible rank. While the scalar processor array only contains one single processor, also called the host processor, the size of all default processor arrays with a rank N, $N \ge 1$ , is always identical to the number of available processors. The shape of the processor array is chosen in such a way that the number of processors is nearly identical for every dimension.

Table 1 shows some examples for initial abstract processor arrays.

Table 1: Default abstract processor arrays in ADAPTOR.

NP	Torus 1	Torus 2	Torus 3
5	5	1 x 5	1 x 1 x 5
6	6	2 x 3	1 x 2 x 3
8	8	2 x 4	2 x 2 x 2
12	12	3 x 4	2 x 2 x 3
16	16	4 x 4	2 x 2 x 4
32	32	4 x 8	2 x 4 x 4
33	33	3 x 11	1 x 3 x 11

If in a distribute directive the ONTO-clause is not specified, the distribution will be onto the default processor array whose rank is given by the number of distributed dimensions.

Attention: A missing ONTO-clause implies not an underspecified mapping. Only ONTO * stands for an underspecified mapping.

E..6.3 Missing Distribution Format

If in a distribute directive no distribution formats are specified, then the compiler will chose block distributions for the last dimensions where n is the rank of the processor array.

!hpf$ processors P(3)
      real, dimension (N, N) :: A
!hpf$ distribute onto P :: A

For this example, ADAPTOR takes the following distribution:

!hpf$ distribute (*,BLOCK) onto P :: a

Attention: It is an error to specify a processor arrangement that has a greater rank than the rank of the distributed object.

It should also be observed that a missing distribution format is different to an unspecified distribution format.

!hpf$ processors P(3)
      real, dimension (N, N) :: A, B
!hpf$ distribute onto P   :: A
!hpf$ distribute * onto P :: A       !  underspecified mapping

Next: F. Layout of Data Up: ADAPTOR HPF Language Reference Previous: D. ADAPTOR Problems and Contents Index

Thomas Brandes 2004-03-18