Instrumented programs are compiled in the usual way, but must be linked with the ADAPTOR runtime system that provides the corresponding functionality to gather profiling information for the regions at runtime. Linking should be done by using the ADAPTOR compiler driver adaptor (see ADAPTOR Users Guide [Bra04]).
adaptor -o executable file1.o ... [options]
Figure 4 gives an overview of the ADAPTOR runtime system. While the modules REGIONS and COUNTERS are always linked with the instrumented program, the following modules require certain attention:
If no flag has been set for linking, the compiler driver takes dummy versions of all these libraries. The dummy libraries for performance counting and tracing provide only very small support for profiling, but at least it is possible to get the wall time for the regions of the program.
The instrumentation of Fortran programs is done by the source-to-source translation fadapt. By using the flag -Wa the compiler driver adaptor passes the corresponding option to fadapt.
adaptor -Wa"-INSTR" ... adaptor -Wa"-DINSTR" ... -INSTR = -instr -prof:SUB -prof:PAR -dsp=no -args=single -DINSTR = -instr -prof:SUB -prof:PAR -prof:DATA -dsp=array -args=single
Serial programs are compiled as follows:
adaptor -Wa"-INSTR" serial.f -o serial [-trace=vtsp] [-pm=...]
The possibilities for profiling depend on the corresponding interfaces for performance counting and tracing.
Instrumented compilation of OpenMP programs is a little bit more complicated:
adaptor -Wa"-INSTR" -Wf"-openmp" -Wl"-openmp" -sm=omp parallel.f -o parallel
It is possible to define a new option in the configuration file .adaptorrc
instr_openmp = -Wa"-INSTR" -Wf"-openmp" -Wl"-openmp" -sm=omp
that makes OpenMP compilation with instrumentation easier to handle:
adaptor -instr_openmp parallel.f -o parallel
A parallel OpenMP program can also be compiled by the ADAPTOR compilation system (instrumentation is default). The flag -openmp uses the ADAPTOR system for OpenMP compilation. The ADAPTOR OpenMP compiler uses the Pthreads library for thread parallelism and therefore the flag -openmp enables -sm=pthreads (see Figure 5).
adaptor -openmp parallel.f -o parallel
Instrumentation and compilation of MPI programs can be done as follows:
-instr_mpi = -I$(MPI_HOME)/include -Wa"-INSTR" -hpf -dm=mpi
adaptor -instr_mpi mpi_example.f -o mpi_example
By default, the ADAPTOR runtime system has no access to any hardware performance counter. But it is possible to use one its interfaces to PAPI, PCL or Perfctr to get access to the performance counters.
The performance monitoring interface must implement the following routines (see file pm.h of the ADAPTOR runtime system DALIB):
int dalib_hpm_max_events ()The number of supported performance events can be higher than the number of counters that can be really used to count events.
char *dalib_hpm_event_string (int i)
char *dalib_hpm_event_info (int i)
int dalib_hpm_kind_map (int kind)
void dalib_hpm_define_counters (int hpm_event_kind[], int no_counters);
void dalib_hpm_get_counters (thread_private_data *TPD)
Thread private data is needed for the definition of counters and for reading the values of the counters:
typedef struct { ... void *HPM_Handle; /* used for Hardware Performance Counters */ int HPM_Started; /* used for Hardware Performance Counters */ long long HPM_Counters [MAX_HPM_COUNTERS]; ... } thread_private_data;
PAPI (Performance Application Programming Interface) [Muc01] aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. ADAPTOR can use PAPI as a high level API for accessing hardware performance counters. There is an interface pm_papi implementing the PM routines by using the PAPI library. If PAPI is available, ADAPTOR profiling can be used to count performance events accessible by PAPI.
Linking PAPI performance monitoring support is done as follows:
adaptor -o executable -pm=papi
The flag -pm=papi links the ADAPTOR interface for PAPI and the corresponding PAPI library. During the installation of ADAPTOR (see [Bra04]) the PAPI interface must have been generated and enabled.
As the availabilty of PAPI performance counters is different on all machines, the following command should be used to get an overview of the supported counters:
executable -pm -help
... 16: HPM L1_DCM (Level 1 data cache misses) 17: HPM L2_DCM (Level 2 data cache misses) ... ... 34: HPM L1_DCA (Level 1 data cache accesses) 35: HPM FP_OPS (Floating point operations)
Table 3 shows how the default ADAPTOR performance counters are mapped to corresponding PAPI counters.
The PCL interface [BM03] is another software package that provides an unified access to performance counters on different platforms.
adaptor -o executable -pm=pcl
The flag -pm=pcl links the ADAPTOR interface for PCL and the corresponding PCL library. During he installation of ADAPTOR (see [Bra04]) the PCL interface must have been generated and enabled.
As the availabilty of performance counters is different on all machines, the following command should be used to get an overview of the supported counters:
executable -pm -help
Table 4 shows how the default ADAPTOR performance counters are mapped to corresponding PCL counters.
Perfctr is a package that adds support to the Linux kernel for using the Performance-Monitoring Counters (PMCs) found in many modern x86-class processors. This package and future versions of this package can be downloaded from
http://www.csd.uu.se/~mikpe/linux/perfctr/
PAPI and PCL use this package on Linux platforms with Intel processors as a low level API to get access to the Performance Monitoring Counters (PMCs). On Linux platforms with Intel Pentium 4, ADAPTOR can use this package directly instead of using the PAPI and PCL interface. By this way, it is possible to use more machine-specific performance counters and to avoid certain overhead introduced by PAPI or PCL.
adaptor -pm=perfctr
The following command gives an overview of the supported counters:
executable -pm -help
Appendix D lists all performance events that can be used with this PM interface.
Tracing specifies that certain events of the program will be protocolled and not summmarized at the end of the program.
It depends on the used DALIB interface what happens with the trace calls. If the dummy interface is used (no flag specified for linking with adaptor), the events are handled in the following way:
The Vampirtrace profiling tool produces tracefiles that can be analyzed with the Vampir performance analysis tool. By default, it records only all calls to the MPI library, but it also allows arbitrary user defined events to be recoreded. The ADAPTOR runtime system uses this feature to generate trace records as follows:
By this way, trace records are available for the defintion, entry and exit of a region and for the definition and actual values of the performance counters.
The VampirTrace interface and the VampirTrace library are linked as follows:
adaptor -trace=vt ! for MPI programs adaptor -trace=vtsp ! for serial programs
By this option (e.g. -trace=vt) the compiler driver links the interface (trace_vt) of the DALIB and links the Vampirtrace library as specified in the adaptor configuration file .adaptorrc by the entry TRACE:vt = ....
Vampirtrace provides a library that traces just one process and works without MPI. This library should be used in case of serial programs -trace=vtsp.