It is very important to know at compile time whether a subroutine is called in the blocking mode or in the serial mode.
By default, a subprogram will be called in the serial mode.
Subprograms that should be executed in the blocking mode must be given an explicit HPF keyword. There are two possibilities for such subprograms that will be entered by each abstract processor:
extrinsic (HPF_GLOBAL) subroutine SUB (A, B, N) real, dimension (N) :: A, B, C !hpf$ distribute (block) :: A, B, C ...
extrinsic (HPF_LOCAL) subroutine SUB (A, B) real, dimension (:) :: A, B !hpf$ distribute (block) :: A, B ...
By default, the HPF compiler assumes that a subroutine is called in the master mode. Therefore it enables the blocking mode for independent computations. This must not be done if the subroutine is called itself in the blocking mode.
extrinsic (HPF_GLOBAL) subroutine SUB (A, B, N) real, dimension (N) :: A, B, C !hpf$ distribute (block) :: A, B, C ...
In contrary to the HPF_LOCAL routines, the global view of the computations and data remain and work sharing is applied as in a parallel region.
The main pitfall of a HPF_GLOBAL routine might be the fact that local arrays or new allocated arrays will only be allocated for one abstract processor. In the example program, every processor allocates the array C (or better its local part) during its execution and the data are not accessible for the other processors. In other words, shared arrays between the processors can only be passed as dummy arguments to the subroutine.
A HPF local routine allows to write single-processor code that works only on data that is mapped to a given abstract processor. In this sense, a local routine contains only local computations and can be executed independently by all processors on which the routine is invoked.
integer, parameter :: N = 100 real, dimension (N,N) :: A !hpf$ distribute A (block, block) call SUB (A) extrinsic (HPF_LOCAL) subroutine SUB (A) use HPF_LOCAL_LIBRARY real, dimension (:,:) :: A print *, my_processor(), ' has ', lbound(A), ubound(A) print *, my_processor(), ' has local ', size(A), ' elements' print *, my_processor(), ' has global ', & global_size(A), ' elements' end
The ececution of this code on an abstract processor array of processors gives the following output:
0 has 1 1 34 50 0 has local 1700 elements 0 has global 10000 elements 1 has 35 1 68 50 1 has local 1700 elements 1 has global 10000 elements 2 has 69 1 100 50 2 has local 1600 elements 2 has global 10000 elements 3 has 1 51 34 100 3 has local 1700 elements 3 has global 10000 elements 4 has 35 51 68 100 4 has local 1700 elements 4 has global 10000 elements 5 has 69 51 100 100 5 has local 1600 elements 5 has global 10000 elements