next up previous contents
Next: 6 Counting Performance Events Up: ADAPTOR Profiling Guide Previous: 4 Source Code Instrumentation   Contents

Subsections

5 Compiling and Linking of Instrumented Programs

Instrumented programs are compiled in the usual way, but must be linked with the ADAPTOR runtime system that provides the corresponding functionality to gather profiling information for the regions at runtime. Linking should be done by using the ADAPTOR compiler driver adaptor (see ADAPTOR Users Guide [Bra04]).

    adaptor -o executable file1.o ... [options]

Figure 4: ADAPTOR runtime system.
\includegraphics[height=34mm]{adp_runtime.eps}

Figure 4 gives an overview of the ADAPTOR runtime system. While the modules REGIONS and COUNTERS are always linked with the instrumented program, the following modules require certain attention:

If no flag has been set for linking, the compiler driver takes dummy versions of all these libraries. The dummy libraries for performance counting and tracing provide only very small support for profiling, but at least it is possible to get the wall time for the regions of the program.

5.1 Using the ADAPTOR Compiler Driver

The instrumentation of Fortran programs is done by the source-to-source translation fadapt. By using the flag -Wa the compiler driver adaptor passes the corresponding option to fadapt.

    adaptor -Wa"-INSTR" ...
    adaptor -Wa"-DINSTR" ...

    -INSTR   = -instr -prof:SUB -prof:PAR -dsp=no -args=single
    -DINSTR  = -instr -prof:SUB -prof:PAR -prof:DATA -dsp=array -args=single

5.1.1 Instrumented Compilation of Serial Programs

Serial programs are compiled as follows:

    adaptor -Wa"-INSTR" serial.f -o serial [-trace=vtsp] [-pm=...]

The possibilities for profiling depend on the corresponding interfaces for performance counting and tracing.


5.1.2 Instrumented Compilation of OpenMP Programs

Instrumented compilation of OpenMP programs is a little bit more complicated:

    adaptor -Wa"-INSTR" -Wf"-openmp" -Wl"-openmp" -sm=omp parallel.f -o parallel

Figure 5: ADAPTOR interfaces for OpenMP (shared memory) runtime support.
\includegraphics[height=64mm]{sm_libs.eps}

It is possible to define a new option in the configuration file .adaptorrc

    instr_openmp = -Wa"-INSTR" -Wf"-openmp" -Wl"-openmp" -sm=omp

that makes OpenMP compilation with instrumentation easier to handle:

    adaptor -instr_openmp parallel.f -o parallel

A parallel OpenMP program can also be compiled by the ADAPTOR compilation system (instrumentation is default). The flag -openmp uses the ADAPTOR system for OpenMP compilation. The ADAPTOR OpenMP compiler uses the Pthreads library for thread parallelism and therefore the flag -openmp enables -sm=pthreads (see Figure 5).

    adaptor -openmp parallel.f -o parallel


5.1.3 Instrumentation of MPI Programs

Instrumentation and compilation of MPI programs can be done as follows:

  -instr_mpi  = -I$(MPI_HOME)/include -Wa"-INSTR" -hpf -dm=mpi

    adaptor -instr_mpi mpi_example.f -o mpi_example


5.2 Linking for Performance Monitoring

By default, the ADAPTOR runtime system has no access to any hardware performance counter. But it is possible to use one its interfaces to PAPI, PCL or Perfctr to get access to the performance counters.

Figure 6: ADAPTOR interfaces for performance monitoring.
\includegraphics[height=74mm]{pm_libs.eps}

5.2.1 The Performance Monitoring Interface

The performance monitoring interface must implement the following routines (see file pm.h of the ADAPTOR runtime system DALIB):

Thread private data is needed for the definition of counters and for reading the values of the counters:

typedef struct { 
    ...
    void *HPM_Handle;   /* used for Hardware Performance Counters */
    int  HPM_Started;   /* used for Hardware Performance Counters */
    long long HPM_Counters [MAX_HPM_COUNTERS];
    ...
  } thread_private_data;

5.2.2 Performance Monitoring With PAPI

PAPI (Performance Application Programming Interface) [Muc01] aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. ADAPTOR can use PAPI as a high level API for accessing hardware performance counters. There is an interface pm_papi implementing the PM routines by using the PAPI library. If PAPI is available, ADAPTOR profiling can be used to count performance events accessible by PAPI.

Linking PAPI performance monitoring support is done as follows:

    adaptor -o executable -pm=papi

The flag -pm=papi links the ADAPTOR interface for PAPI and the corresponding PAPI library. During the installation of ADAPTOR (see [Bra04]) the PAPI interface must have been generated and enabled.

As the availabilty of PAPI performance counters is different on all machines, the following command should be used to get an overview of the supported counters:

    executable -pm -help

 ...
 16: HPM             L1_DCM (Level 1 data cache misses)
 17: HPM             L2_DCM (Level 2 data cache misses)
 ...                 ...
 34: HPM             L1_DCA (Level 1 data cache accesses)
 35: HPM             FP_OPS (Floating point operations)

Table 3 shows how the default ADAPTOR performance counters are mapped to corresponding PAPI counters.


Table 3: Mapping of ADAPTOR default performance counters to PAPI counters.
ADAPTOR event PAPI event
TOT_CYC PAPI_TOT_CYC
TOT_INS PAPI_TOT_INS
LD_INS not available
SR_INS not available
FP_INS PAPI_FP_INS
L1_MISS PAPI_L1_DCM
L2_MISS PAPI_L2_DCM
TLB_MISS PAPI_TLB_TL


5.2.3 Performance Monitoring With PCL

The PCL interface [BM03] is another software package that provides an unified access to performance counters on different platforms.

    adaptor -o executable -pm=pcl

The flag -pm=pcl links the ADAPTOR interface for PCL and the corresponding PCL library. During he installation of ADAPTOR (see [Bra04]) the PCL interface must have been generated and enabled.

As the availabilty of performance counters is different on all machines, the following command should be used to get an overview of the supported counters:

    executable -pm -help

Table 4 shows how the default ADAPTOR performance counters are mapped to corresponding PCL counters.


Table 4: Mapping of ADAPTOR default performance counters to PCL counters.
ADAPTOR event PCL event
TOT_CYC PCL_CYCLES
TOT_INS PCL_INSTR
LD_INS PCL_LOAD_INSTR
SR_INS PCL_STORE_INSTR
FP_INS PCL_FP_INSTR
L1_MISS PCL_L1DCACHE_MISS
L2_MISS PCL_L2DCACHE_MISS
TLB_MISS PCL_DTLB_MISS



5.2.4 Performance Monitoring With Perfctr

Perfctr is a package that adds support to the Linux kernel for using the Performance-Monitoring Counters (PMCs) found in many modern x86-class processors. This package and future versions of this package can be downloaded from

http://www.csd.uu.se/~mikpe/linux/perfctr/

PAPI and PCL use this package on Linux platforms with Intel processors as a low level API to get access to the Performance Monitoring Counters (PMCs). On Linux platforms with Intel Pentium 4, ADAPTOR can use this package directly instead of using the PAPI and PCL interface. By this way, it is possible to use more machine-specific performance counters and to avoid certain overhead introduced by PAPI or PCL.

    adaptor -pm=perfctr

The following command gives an overview of the supported counters:

    executable -pm -help


Table 5: Mapping of ADAPTOR default performance counters to Perfctr counters.
TOT_CYC TOT_CYCLES
TOT_INS TOT_INS_C
LD_INS LD_INS
SR_INS SR_INS
FP_INS FP_INS_C
L1_MISS L2_READ
L2_MISS L2_MISS
TLB_MISS DTLB_MISS


Appendix D lists all performance events that can be used with this PM interface.


5.3 Linking for Tracing

Tracing specifies that certain events of the program will be protocolled and not summmarized at the end of the program.

5.3.1 Tracing Routines

It depends on the used DALIB interface what happens with the trace calls. If the dummy interface is used (no flag specified for linking with adaptor), the events are handled in the following way:

5.3.2 Linking with Vampirtrace

The Vampirtrace profiling tool produces tracefiles that can be analyzed with the Vampir performance analysis tool. By default, it records only all calls to the MPI library, but it also allows arbitrary user defined events to be recoreded. The ADAPTOR runtime system uses this feature to generate trace records as follows:

By this way, trace records are available for the defintion, entry and exit of a region and for the definition and actual values of the performance counters.

The VampirTrace interface and the VampirTrace library are linked as follows:

    adaptor -trace=vt             ! for MPI programs
    adaptor -trace=vtsp           ! for serial programs

By this option (e.g. -trace=vt) the compiler driver links the interface (trace_vt) of the DALIB and links the Vampirtrace library as specified in the adaptor configuration file .adaptorrc by the entry TRACE:vt = ....

Vampirtrace provides a library that traces just one process and works without MPI. This library should be used in case of serial programs -trace=vtsp.


next up previous contents
Next: 6 Counting Performance Events Up: ADAPTOR Profiling Guide Previous: 4 Source Code Instrumentation   Contents
Thomas Brandes 2004-03-19