next up previous contents index
Next: 4 Local Computations Up: ADAPTOR HPF Programmers Guide Previous: 2 Execution Model of   Contents   Index

Subsections

3 Home of Computations and Work Distribution

Parallel execution is achieved by distributing the computations to the processors. This task, referred to as work distribution, is usually done automatically by the compiler based on the user-specified data distribution. This work distribution is based on the owner computes rule, which ensures that processors only execute assignments to those elements of a distributed array they own.

Alternatively, work distribution may also be explicitly controlled by the user by means of the ON and ON HOME clause.

As the mapping of data implies the work distribution of the computations in the program and the necessary communication, it is very important to pay attention to this mapping first.

Note: ADAPTOR allows to generate an intermediate file that contains all information about the home of computations chosen by the HPF compiler.

3.1 Importance of Work Distribution

Two issues are the most important factors for the work distribution:

The home of computations should be chosen in such a way that a very high data locality is achieved.

3.2 Default Work Distribution in ADAPTOR

The main criteria for the load distribution are:

For optimization issues it might be the case that there are exceptions from these rules.


3.3 The ON Directive

The ON directive allows the user to control explicitly the distribution of computations among the processors of a parallel machine.

!hpf$ on home (A(I))
      A(I) = B(I) + C(I)

    !hpf$ processors PROCS(4)
          real, dimension (N) :: A
    !hpf$ distribute A(block) onto PROCS(3:4)
          ...
    !hpf$ on (PROCS(1:2))
            call SUB1()
    !hpf$ on home (A) begin
            call SUB2()
            call SUB3()
    !hpf$ end on

In the HOME clause, the user can specify a processor array or a processor subset or an array (template) or a subsection of an array (template).

The ON directive restricts the active processor set for a computation to those processors named in the home, or to the processors that own at least one element of the specified array or template. It should be noted that the ON directive only advises the compiler to use the corresponding processors to perform the ON statement or block. In nearly most situations, ADAPTOR accepts the home directive of the user. But ADAPTOR also informs the user if it overrides the user's advice.

All HOME specifications as proposed in the HPF 2.0 standard can be used within ADAPTOR. In contrary to the HPF 2.0 standard, ADAPTOR also allows vector-subscripts for processor subsets. This gives more flexibility in mapping data to certain processors and for the selection of processors executing a task.

    !hpf$ processors P (6)
          integer, dimension (4) :: IND = (/ 1, 3, 4, 6 /)
          real, dimension (N)    :: A1, A2
    !hpf$ distribute A1 (block) onto P
    !hpf$ distribute A2 (block) onto P(IND)
          ...
    !hpf$ on (P(IND))
             call TASK (A2, N)

3.4 Active Processors

An active processor is one that executes an HPF statement or block. Usually, an HPF program begins execution with all processors active. But the execution can be restricted to a subset of processors or to a single processor in the following cases:

3.5 Restrictions for the ON Directive

Certain statements cannot be executed by a given processor subset, e.g.:

3.6 Execution of Subroutines

Usually, every user subroutine and every user function will be entered by all processors. These are the exceptions:

From the caller's standpoint, an invocation of a local procedure from a ''global`` HPF program has the same semantics as an invocation of a regular procedure.


3.7 Execution of I/O Statements

ADAPTOR does not support parallel I/O until now. Currently I/O statements are translated in such a way that I/O operations are executed by the one dedicated processor. In the following, this processor is called master processor.

Due to the fact that only one processor executes I/O statements there are some inconveniences that should be observed.

Attention: As the full array is allocated on a single node, there might be memory problems with large arrays. In such a case, it is recommended to read single columns or rows of a matrix.


3.8 Serial Procedures

A serial routine is called by one single processor only. The data of the actual arguments will be mapped automatically to the processor executing the routine. In the first place, serial routines are not intended for parallelism but to guarantee that certain operations (e.g. for steering external devices) are really executed once.

                                             extrinsic (HPF_SERIAL)
      program WORK                          &subroutine SUB (A, N)
      real, dimension (N,N) :: A             real, dimension (:) :: A
!hpf$ distribute A(*,block)                  integer, intent(in) :: N
      ...                                    ...
      call SUB (A(1,:),N)                    ...
      do J = 1, N                            ...
         call SUB (A(:,J),N)                 end subroutine SUB
      end do

In the example, the first call of the subroutine SUB will be executed by one processor, probably the first one as in ADAPTOR. The data of the distributed row is automatically collected onto this processor and sent back after the call.

The second call can be executed directly by the processor that owns column J. This would avoid any communication and all other processor can skip the call. The execution of the loop results in a parallel execution.


next up previous contents index
Next: 4 Local Computations Up: ADAPTOR HPF Programmers Guide Previous: 2 Execution Model of   Contents   Index
Thomas Brandes 2004-03-18