Next: 9 Data Profiling Up: ADAPTOR Profiling Guide Previous: 7 Tracing of Performance Contents

Subsections

8 Profiling of Communication

Communication is an important issue for the optimization of parallel programs. ADAPTOR supports profiling of communication for parallel HPF programs compiled for distributed memory architectures by ADAPTOR itself.

Note: ADAPTOR does not support profiling of communication for MPI programs. This can be done by other tools like VAMPIR. As ADAPTOR generates MPI programs from data parallel HPF programs, other tools for profiling MPI communication can also be applied to the generated MPI programs.

In the following example program the distributed array A will be replicated (default for array B. This replication requires communication as every processor needs the values of all non-local data.

      program TEST
      integer, parameter :: N = 100
      integer, dimension (N) :: A, B
!hpf$ distribute (block) :: A
      B = A
      end program TEST

The compilation does not need any special flag to enable profiling of communication.

    adaptor -hpf_mpi -o example example.hpf

8.1 Counters for Communication

Currently, ADAPTOR supports the following counters within its HPF runtime system:

number of sends SEND_CALLS and receives RECV_CALLS,
number of sent bytes SEND_BYTES and received bytes RECV_BYTES.

These counters can be enabled to get information about communication in the program.

    mpirun -np 4 example -pm=pm_config

	`SEND_CALLS`	`RECV_CALLS`	`SEND_BYTES`	`RECV_BYTES`
`1`	3	3	300	300
`2`	3	3	300	300
`3`	3	3	300	300
`4`	3	3	300	300

Table 10 shows a more complex example for counting coummunication.

			`SEND`	`SEND`	`SEND`	`RECV`	`RECV`	`RECV`
region	calls	walltime	`CALLS`	`BYTES`	`BYTES'`	`CALLS`	`BYTES`	`BYTES'`
		netto	netto	netto	netto	netto	netto	netto
		s		M	M		M	M
`HYDFLO`	1	3.648	2	0.000	0.000	2305	0.018	0.005
`INITIAL`	1	0.037	52	0.861	23.464	25	0.387	10.535
`GHOST5`	101	0.286	121	10.635	37.244	101	8.877	31.088
`SEEDRANF`	1	0.000	0	-	-	0	-	-
`MPPRANS`	1	0.088	0	-	-	0	-	-
`RANFK`	1	0.000	0	-	-	0	-	-
`IRANFEVEN`	1	0.000	0	-	-	0	-	-
`RANFKBINARY`	1	0.000	0	-	-	0	-	-
`IRANFODD`	48	0.000	0	-	-	0	-	-
`RANFATOK`	1	0.000	0	-	-	0	-	-
`RANFMODMULT`	125075	0.329	0	-	-	0	-	-
`MPPRANF`	360	2.121	0	-	-	0	-	-
`GHOST3`	10	0.061	20	1.144	18.689	10	0.572	9.344
`FLUX`	20	4.175	140	3.516	0.842	140	3.516	0.842
`HYDRO`	20	0.559	30	2.637	4.718	30	2.637	4.718

8.2 Communication Statistics

    mpirun -np 2 executable -comm

The flag -comm that will give a communication statistics file at the end of the program. In contrary to the communication counters (see Section 8.1) the communication statistics will give more detailed information about how the processors interact with each other but will not relate this information with the different subprograms (regions).

By running the program with the flag -comm a communication statistic will be printed at the end of the program.

    mpirun -np 2 example -comm

   Communication volume summary for program example (2 procs):
    Number of  Bytes sent by all procs:
        to       all         1         2
       all       400       200       200
         1       200         0       200
         2       200       200         0

The replication of the array A implies a communication of the non-local part of processor 1 to processor 2 and vice versa. The number of bytes for the non-local part is $200 = 50 \times 4$ bytes.

    export COMMSTAT=1
    mpirun -np 4 example -comm

Table 11: Communication statistic.

	all	1	2	3	4
all	1200	300	300	300	300
1	300	0	300	0	0
2	300	0	0	300	0
3	300	0	0	0	300
4	300	300	0	0	0

The communication statistics of the program running onto four processors shows that every processor sends $300 = 3 \times 25 \times 4$ bytes to its right neighbor. By looking at the more detailed information in the file example.commstat on can see that the replication is actually done in three () steps.

Communication Summary for program example (4 procs):
 total number of sends by all procs:
     to    all      1      2      3      4
    all     12      3      3      3      3
      1      3      0      3      0      0
      2      3      0      0      3      0
      3      3      0      0      0      3
      4      3      3      0      0      0
...
Communication statistics of proc # 1:

 SEND  total     total       avg       min       max
  PID    num     Bytes     Bytes     Bytes     Bytes
  all      3       300       100       100       100
    1      0         0         0         0         0
    2      3       300       100       100       100
    3      0         0         0         0         0
    4      0         0         0         0         0

The communication statistic prints a summary for the whole program and does not give any information related to the regions of the program. Nevertheless, it is possible to use communication counters that will give information related to the regions of the program.

8.3 Tracing of Communication

Vampir provides the possibility to visualize MPI. This can also be used for HPF programs compiled for distributed memory architectures using MPI.

    adaptor -vt -hpf_mpi -o example example.hpf

Important: The flag -vt should be set before the flag -hpf_mpi.

Running the application without any profiling by the ADAPTOR runtime will still do the profiling with VAMPIR.

    mpirun -np 4 example

But enabling the tracing guarantees that regions of the source code can be identified in the VAMPIR visualization.

    mpirun -np 4 example -trace

Therefore, if the (parallel) program is linked with -trace=vt and executed with -trace, a tracefile <exec>.stf will be generated that can afterwards be visualized with VAMPIR. In any case, the tracefile contains information about all profiled regions of the program (this includes timestamps for every enter and exit of the profiled region, but also source handles). Furthermore, if performance monitoring is enabled (flag -pm at runtime), the tracefile will also contain the performance counter values that can be visualized with VAMPIR.

Vampirtrace allows its own sampling of performance counters. By using this functionality the program should be linked with -trace=vtsample. Vampirtrace reads the (hardware) performance counter as specified in the configuration file (environment variable VT_CONFIG is used to specifiy this configuration file). The performance counters will be read at every trace event. As the flag -trace generates VampirTrace calls for every enter and exit of a region, performance counter values related to the regions of the program will be available. Using the ADAPTOR runtime flag -pm is still possible, but then performance counter values will appear twice. The difference between the two approaches of performance counter monitoring are the following ones:

VampirTrace only supports PAPI performance counters while ADAPTOR supports PAPI and PCL performance counters. Furthermore, it is possible to integrate other performance counters to be measured and visualized.
In VampirTrace the values of the performance counters are read with every trace event, this includes also the MPI calls. ADAPTOR reads the values only when a region is left or entered..
Be aware that VampirTrace uses a configuration file specified by the environment variable VT_CONFIG and ADAPTOR the one specified by the environment variable PM (or flag -pm=<configfile>). Also the syntax used in the configuration files is different.

Currently, ADAPTOR supports performance counters for the communication volume (number of bytes sent and received for every process) and can give a communication summary at the end of the program (number of bytes exchanged between the different processors). But ADAPTOR does not generate any trace call for the communication.

Nevertheless, it is possible to use VampirTrace for tracing of MPI communication generated for HPF programs (MPI must have been set as communication model for the distributed memory model by the ADAPTOR link flag -dm=mpi). This is possible as linking with the MPI library must only be replaced with linking the correspondig VampirTrace library. Therefore compiling and linking a parallel HPF program with the following ADAPTOR command enables the generation of a VampirTrace file for MPI communication (no ADAPTOR runtime flag has to be set).

adaptor -hpf_mpi -vt <program>.hpf

It is possible to combine the MPI visualization with the ADAPTOR supported trace capabilities for VampirTrace. The link step is as follows:

adaptor -hpf_mpi -trace=vt <program>.hpf

If now the runtime flag -trace is used, trace calls will not only be generated for the MPI calls (done automatically by VampirTrace) but also for the entering and leaving of regions (done by the ADAPTOR runtime system using the VampirTrace interface).

Next: 9 Data Profiling Up: ADAPTOR Profiling Guide Previous: 7 Tracing of Performance Contents

Thomas Brandes 2004-03-19