Communication is an important issue for the optimization of parallel programs. ADAPTOR supports profiling of communication for parallel HPF programs compiled for distributed memory architectures by ADAPTOR itself.
Note: ADAPTOR does not support profiling of communication for MPI programs. This can be done by other tools like VAMPIR. As ADAPTOR generates MPI programs from data parallel HPF programs, other tools for profiling MPI communication can also be applied to the generated MPI programs.
In the following example program the distributed array A will be replicated (default for array B. This replication requires communication as every processor needs the values of all non-local data.
program TEST integer, parameter :: N = 100 integer, dimension (N) :: A, B !hpf$ distribute (block) :: A B = A end program TEST
The compilation does not need any special flag to enable profiling of communication.
adaptor -hpf_mpi -o example example.hpf
Currently, ADAPTOR supports the following counters within its HPF runtime system:
These counters can be enabled to get information about communication in the program.
mpirun -np 4 example -pm=pm_config
Table 10 shows a more complex example for counting coummunication.
mpirun -np 2 executable -comm
The flag -comm that will give a communication statistics file at the end of the program. In contrary to the communication counters (see Section 8.1) the communication statistics will give more detailed information about how the processors interact with each other but will not relate this information with the different subprograms (regions).
By running the program with the flag -comm a communication statistic will be printed at the end of the program.
mpirun -np 2 example -comm Communication volume summary for program example (2 procs): Number of Bytes sent by all procs: to all 1 2 all 400 200 200 1 200 0 200 2 200 200 0
The replication of the array A implies a communication of the non-local part of processor 1 to processor 2 and vice versa. The number of bytes for the non-local part is bytes.
export COMMSTAT=1 mpirun -np 4 example -comm
The communication statistics of the program running onto four processors shows that every processor sends bytes to its right neighbor. By looking at the more detailed information in the file example.commstat on can see that the replication is actually done in three () steps.
Communication Summary for program example (4 procs): total number of sends by all procs: to all 1 2 3 4 all 12 3 3 3 3 1 3 0 3 0 0 2 3 0 0 3 0 3 3 0 0 0 3 4 3 3 0 0 0 ... Communication statistics of proc # 1: SEND total total avg min max PID num Bytes Bytes Bytes Bytes all 3 300 100 100 100 1 0 0 0 0 0 2 3 300 100 100 100 3 0 0 0 0 0 4 0 0 0 0 0
The communication statistic prints a summary for the whole program and does not give any information related to the regions of the program. Nevertheless, it is possible to use communication counters that will give information related to the regions of the program.
Vampir provides the possibility to visualize MPI. This can also be used for HPF programs compiled for distributed memory architectures using MPI.
adaptor -vt -hpf_mpi -o example example.hpf
Important: The flag -vt should be set before the flag -hpf_mpi.
Running the application without any profiling by the ADAPTOR runtime will still do the profiling with VAMPIR.
mpirun -np 4 example
But enabling the tracing guarantees that regions of the source code can be identified in the VAMPIR visualization.
mpirun -np 4 example -trace
Therefore, if the (parallel) program is linked with -trace=vt and executed with -trace, a tracefile <exec>.stf will be generated that can afterwards be visualized with VAMPIR. In any case, the tracefile contains information about all profiled regions of the program (this includes timestamps for every enter and exit of the profiled region, but also source handles). Furthermore, if performance monitoring is enabled (flag -pm at runtime), the tracefile will also contain the performance counter values that can be visualized with VAMPIR.
Vampirtrace allows its own sampling of performance counters. By using this functionality the program should be linked with -trace=vtsample. Vampirtrace reads the (hardware) performance counter as specified in the configuration file (environment variable VT_CONFIG is used to specifiy this configuration file). The performance counters will be read at every trace event. As the flag -trace generates VampirTrace calls for every enter and exit of a region, performance counter values related to the regions of the program will be available. Using the ADAPTOR runtime flag -pm is still possible, but then performance counter values will appear twice. The difference between the two approaches of performance counter monitoring are the following ones:
Currently, ADAPTOR supports performance counters for the communication volume (number of bytes sent and received for every process) and can give a communication summary at the end of the program (number of bytes exchanged between the different processors). But ADAPTOR does not generate any trace call for the communication.
Nevertheless, it is possible to use VampirTrace for tracing of MPI communication generated for HPF programs (MPI must have been set as communication model for the distributed memory model by the ADAPTOR link flag -dm=mpi). This is possible as linking with the MPI library must only be replaced with linking the correspondig VampirTrace library. Therefore compiling and linking a parallel HPF program with the following ADAPTOR command enables the generation of a VampirTrace file for MPI communication (no ADAPTOR runtime flag has to be set).
adaptor -hpf_mpi -vt <program>.hpf
It is possible to combine the MPI visualization with the ADAPTOR supported trace capabilities for VampirTrace. The link step is as follows:
adaptor -hpf_mpi -trace=vt <program>.hpf
If now the runtime flag -trace is used, trace calls will not only be generated for the MPI calls (done automatically by VampirTrace) but also for the entering and leaving of regions (done by the ADAPTOR runtime system using the VampirTrace interface).