next up previous
Next: 2 The HPF Execution Up: Blocking Techniques for HPF Previous: Blocking Techniques for HPF

1 Introduction

High Performance Fortran [2] as a high level programming language offers compiler directives to map the data of the user program to abstract processors. This mapping has two levels (see Figure 1). On the first level related data is aligned to each other and on the second level the data is distributed onto abstract processors.

Figure 1: Mapping directives of High Performance Fortran.
\includegraphics[height=60mm]{hpf_map.eps}

In traditional HPF compilers such a mapping information is used to generate a parallel program where one abstract processor will be identified with a physical processor that operates on all the data mapped to it corresponding to the owner computes rules. On parallel machines with distributed memory the HPF compiler generates an equivalent SPMD program where each processor works on its local data and where non-local data is exchanged via message passing (see Figure 2).

Figure 2: The SPMD HPF execution model for distributed machines.
\includegraphics[height=60mm]{dm_model.eps}

The HPF mapping of data can also be used to increase the spatial and temporal locality of data within a program and so to use the cache more effectively. Therefore the mapping is chosen in such a way that the data mapped to one abstract processor will fit in the cache. The execution of the local operations implicitly mapped to the abstract processor should have no more capacity misses. Abstract processors will not be identified any more with physical processors so one physical processor has to emulate many abstract processors. Compiler techniques used in HPF compilers will result in blocking and tiling of data and iteration spaces that result in more effective cache usage.

Compared with existing loop blocking techniques this approach has certain advantages. At first the blocking is no longer restricted to single loop nests but applies to larger regions of the program and will be more effective. On the second hand the data mapping, especially the alignment, can also be used for optimization of the data layout (e.g. merging of arrays).

This paper describes the HPF blocking execution model in more detail and shows some example programs and performance results. The HPF compilation system ADAPTOR [1] provided by SCAI has been extended to realize this HPF execution model.


next up previous
Next: 2 The HPF Execution Up: Blocking Techniques for HPF Previous: Blocking Techniques for HPF
Thomas Brandes 2004-03-18