Next: 3 Home of Computations Up: ADAPTOR HPF Programmers Guide Previous: 1 Overview Contents Index

Subsections

2 Execution Model of HPF Programs

This section describes the different execution models of High Performance Fortran. Though the description is related to the ADAPTOR compilation system, most of the techniques might also be applied within other HPF compilers. This is especially true for the distributed memory execution model based on message passing.

The HPF mapping directives define a mapping of data objects (arrays) to abstract processors. Data objects that have been mapped to a certain abstract processor are said to be owned by that processor. Ownership of data is the central concept for the execution of HPF programs. Based on the ownership of data, the distribution of computations to the abstract processors and the necessary communication and/or synchronization between processors is derived automatically.

**Figure 1:** Mapping directives of High Performance Fortran.
$\includegraphics[height=60mm]{hpf-map.eps}$

While the concept of ownership and work distribution is followed within all HPF execution models, the layout of distributed data and the mapping of abstract processors to processes or threads varies for the different HPF execution models.

2.1 Serial Execution of HPF Programs

HPF programs can also run as serial programs. For this purpose, ADAPTOR treats them as follows:

Mapping directives are completely ignored, every array has exactly one full incarnation.
All independent loops are executed serially.

The FORALL statement and construct are serialized (this might cause the introduction of temporary data).

       forall (I=2:N-1) A(I) = f(A(I-1),A(I+1))

is translated to

       allocate (TMP(2:N-1))
       do I = 2, N-1
          TMP(I) = f(A(I-1),A(I+1))
       end do
       do I = 2, N-1
          A(I) = TMP(I)
       end do
       deallocate (TMP)

Calls for INTRINSIC or HPF Library routines are replaced with calls to a special library version that is available for serial execution.

2.2 The Multiprocessing Execution Model for HPF

In the multiprocessing execution model, every abstract HPF processor becomes an own process with its own local address space. Each process executes the same program but operates only on its own data. Any two processes communicate by exchanging messages. In accordance with the SPMD paradigm, an HPF compiler has to ensure that all processes executing the target program follow the same control flow in a loosely synchronous style. For the parallel execution, each process is usually mapped to one processor.

**Figure 2:** The multiprocessing HPF execution model.
$\includegraphics[height=60mm]{dm_model.eps}$

The multi-processing execution model for HPF, illustrated in Figure 2, has the following main characteristics:

Every process only allocates that portion of a distributed array that is owned by it (local section).
Scalar data and data without mapping directives have an own incarnation on each process, i.e. replicated.
Control flow is replicated (SPMD paradigm).
Computation partitioning (work sharing) is based on the owner-computes rule, but can be redefined by the ON directive of HPF.
All accesses to non-local data are implemented by means of message passing.
Shadow areas and halos are used to minimize the need of additional memory for non-local data.

2.3 The Multithreading Execution Model for HPF

The multi-threading execution model for HPF, targeted towards shared memory parallel architectures only, is fundamentally different from the multi-processing execution model. With the multi-threading execution model, a HPF program is compiled into a shared memory parallel program which utilizes thread parallelism only.

The multithreading execution model employs a set of threads which execute concurrently in a shared address space. All data objects of an HPF program are allocated in an un-partitioned way in the shared memory, regardless of HPF mapping directives. The information provided by the HPF mapping directives is utilized the achieve parallel execution by distributing the computations among the threads. Consistency of shared data objects is guaranteed by automatically generating the required synchronization between threads.

**Figure 3:** The multi-threading HPF execution model.
$\includegraphics[height=60mm]{sm_model.eps}$

The multi-threading execution model for HPF has the following main characteristics [BB00]:

All distributed arrays are allocated in the global Fortran layout in a shared memory. HPF mapping directives only determine the ownership of data to drive the work distribution and loop scheduling for the parallel execution.
Scalar data and data objects without mapping directives have a single incarnation only and are shared among the threads. The master thread becomes the owner of such data.
Control flow outside of independent computations is executed by the master thread.
Computation partitioning (work sharing) is based on the owner-computes rule, but can be redefined by the ON directive of HPF.
Consistency of shared data is ensured by appropriate synchronization primitives.
The HPF compiler compiles the data parallel HPF program into an equivalent Fortran program with OpenMP directives [The97]. The HPF runtime system takes advantage of the OpenMP runtime library routines.
Calls for INTRINSIC or HPF Library routines are replaced with calls to a special library version that is available for thread parallel execution.

REDUCTION and NEW clauses are handled similar to the REDUCTION and PRIVATE clauses of OpenMP.

!hpf$ independent, new(X), reduction(SUM)
      do I = 1, N
         X = W * (I -0.5d0)
         SUM = SUM + F(X)
      end do

is translated to

!omp$ parallel do, private(X), reduction(+:SUM)
      do I = 1, N
         X = W * (I -0.5d0)
         SUM = SUM + F(X)
      end do

The use of this thread based shared memory programming model for HPF can be very convenient for porting applications to HPF. In this model, data locality does not play such an important rule and there are no problems for the HPF compiler to generate efficient communication.

2.4 The Hierarchical Execution Model

Within the hierarchical execution model an HPF program is executed by a set of parallel processes, each of which executes on a separate node of an SMP cluster within its own local address space. Each of these processes employs of a set of threads which execute concurrently in the shared address space of a node. Process parallelism, data partitioning, and message-passing communication is utilized between the nodes of a cluster in much the same way as in the multiprocessing execution model, while within a node additional parallelism is exploited in analogy to the multi-threading execution model by means of threads concurrently executing in a shared address space.

**Figure 4:** The hierarchical HPF execution model.
$\includegraphics[height=70mm]{hyb_model.eps}$

The hierarchical execution model, illustrated in Figure 4 has the following main characteristics:

Every node-process only allocates that portion of an array that has been mapped to it by means of the HPF mapping directives. Data within a node is shared between the processors in a node.
Scalar data and data without mapping directives is replicated across the nodes of a cluster, but has only a single incarnation within a node.
Work distribution among nodes and work sharing within nodes is based on the owner-computes rule.
All accesses to non-local data on a different node require communication via message-passing. Concurrent accesses by multiple threads to shared data within a node may require synchronization.

The hierarchical execution model offers certain advantages against the multiprocessing execution model.

Data without mapping directives will have only one incarnation on a node instead of p incarnations.
Communication that requires replication of data would use less memory and can be more efficient.
All-to-all communication between the processors on two nodes is avoided.
The thread execution model offers more flexibility regarding load balancing.

Next: 3 Home of Computations Up: ADAPTOR HPF Programmers Guide Previous: 1 Overview Contents Index

Thomas Brandes 2004-03-18