next up previous
Next: 5 HPF Blocking with Up: Blocking Techniques for HPF Previous: 3 Explicitly Blocked Computations

Subsections


4 Execution of Subprograms in Blocking Mode

It is very important to know at compile time whether a subroutine is called in the blocking mode or in the serial mode.

By default, a subprogram will be called in the serial mode.

Subprograms that should be executed in the blocking mode must be given an explicit HPF keyword. There are two possibilities for such subprograms that will be entered by each abstract processor:

4.1 HPF Global Routines

By default, the HPF compiler assumes that a subroutine is called in the master mode. Therefore it enables the blocking mode for independent computations. This must not be done if the subroutine is called itself in the blocking mode.

      extrinsic (HPF_GLOBAL) subroutine SUB (A, B, N)
      real, dimension (N) :: A, B, C
!hpf$ distribute (block) :: A, B, C
      ...

In contrary to the HPF_LOCAL routines, the global view of the computations and data remain and work sharing is applied as in a parallel region.

The main pitfall of a HPF_GLOBAL routine might be the fact that local arrays or new allocated arrays will only be allocated for one abstract processor. In the example program, every processor allocates the array C (or better its local part) during its execution and the data are not accessible for the other processors. In other words, shared arrays between the processors can only be passed as dummy arguments to the subroutine.

4.2 HPF Local Routines

A HPF local routine allows to write single-processor code that works only on data that is mapped to a given abstract processor. In this sense, a local routine contains only local computations and can be executed independently by all processors on which the routine is invoked.

      integer, parameter :: N = 100
      real, dimension (N,N) :: A
!hpf$ distribute A (block, block)
      call SUB (A)
      
      extrinsic (HPF_LOCAL) subroutine SUB (A)
      use HPF_LOCAL_LIBRARY
      real, dimension (:,:) :: A
      print *, my_processor(), ' has ', lbound(A), ubound(A)
      print *, my_processor(), ' has local ', size(A), ' elements'
      print *, my_processor(), ' has global ',
     &                         global_size(A), ' elements'
      end

The ececution of this code on an abstract processor array of $3 \times 2$ processors gives the following output:

           0  has            1           1          34          50
           0  has local         1700  elements
           0  has global        10000  elements
           1  has           35           1          68          50
           1  has local         1700  elements
           1  has global        10000  elements
           2  has           69           1         100          50
           2  has local         1600  elements
           2  has global        10000  elements
           3  has            1          51          34         100
           3  has local         1700  elements
           3  has global        10000  elements
           4  has           35          51          68         100
           4  has local         1700  elements
           4  has global        10000  elements
           5  has           69          51         100         100
           5  has local         1600  elements
           5  has global        10000  elements


next up previous
Next: 5 HPF Blocking with Up: Blocking Techniques for HPF Previous: 3 Explicitly Blocked Computations
Thomas Brandes 2004-03-18