Next: 3 Configuration Files Up: ADAPTOR Users Guide Previous: 1 Introduction Contents Index

Subsections

2 Quick Start

2.1 Setting the Environment

Let be <install-dir> the directory on the machine where ADAPTOR is installed. Every user should set the environment variable PHOME to this directory in the following way:

     setenv PHOME <install-dir>   or    export PHOME=<install-dir>

Furthermore, the bin-directory of ADAPTOR should be included in the path variable, the man-directory in the manpath variable.

    setenv PATH $PHOME/bin:$PATH          export PATH=$PHOME/bin:$PATH
    setenv MANPATH $PHOME/man:$MANPATH    export MANPATH=$PHOME/man:$MANPATH

Now there should be no problem to call the commands of ADAPTOR, e.g. the following commands should work correctly:

    fadapt -help
    adaptor -help
    man fadapt
    man adaptor

Furthermore, make sure that the ADAPTOR runtime systems are available.

    ls $PHOME/dalib/lib*.a

The following libraries must be available:

    $PHOME/dalib/libadp_hpf.a           ! HPF runtime for parallel machines
    $PHOME/dalib/libadp_hpf_null.a      ! HPF runtime for serial machines
    $PHOME/dalib/libadp_dm_null.a       ! dummy HPF distributed memory interface
    $PHOME/dalib/libadp_sm_null.a       ! dummy HPF shared memory interface
    $PHOME/dalib/libadp_pm_null.a       ! dummy performance monitoring interface
    $PHOME/dalib/libadp_trace_null.a    ! dummy trace interface

For using HPF parallelism on distributed memory machines via MPI and on shared memory machines via PThreads the following libraries are needed:

    $PHOME/dalib/libadp_dm_mpi.a
    $PHOME/dalib/libadp_sm_pthreads.a

Assuming that you have PAPI or PCL available on your machine, you can use the ADAPTOR instrumentation to get performance data. The following interfaces are necessary:

    $PHOME/dalib/libadp_pm_papi.a
    $PHOME/dalib/libadp_pm_pcl.a

VampirTrace can be utilized to collect runtime data to be visualized by the VAMPIR tool. The following interfaces are needed:

    $PHOME/dalib/libadp_trace_vt.a         ! traces Adaptor + MPI
    $PHOME/dalib/libadp_trace_vtsample.a   ! traces Adaptor + MPI + performance
    $PHOME/dalib/libadp_trace_vtsp.a       ! traces Adaptor for single process

2.2 HPF Compilation with ADAPTOR

The main purpose of the ADAPTOR system is the compilation of HPF programs to parallel programs.

The following data parallel HPF program prime.hpf (assumed to be in the directory $PHOME/hpf_examples/) computes the number of primes in the range from 2 to n. The program uses dynamic arrays and array syntax. Timing functions are used to measure the run time of the program.

Note: This program is used to demonstrate the functionality of HPF and ADAPTOR and not the efficiency of HPF.

There is exactly one array in the program. This array will be block distributed among all processors where the number of processors will be fixed at runtime.

      program PRIME
      integer N, S, K
      logical, allocatable :: A(:)
!hpf$ distribute A(block)
      integer TICKSTART, TICKSTOP, TICKRATE
      print *, 'Input n for counting primes in range 2 to n : '
      read *, N
      call system_clock (TICKSTART, TICKRATE)
      allocate (A(1:N))
      A    = .true.
      A(1) = .false.
      K = 2
      do while (K*K <= N)
         A(K*K:N:K) = .false.    ! sieve all multiples of k 
         K = K + 1
         do while (.not. A(K))   ! find next prime k 
           K = K + 1
         end do
      end do
      S = count (A)
      call system_clock (TICKSTOP)
      TICKSTOP = TICKSTOP - TICKSTART
      print *, 'There are ', S, ' primes until ', N
      print *, 'Time needed : ', float(TICKSTOP)/float(TICKRATE)
      deallocate (A)
      end

Here are some examples for input values and their result.

    Input value:           Result
    ============           ======
             100               25
            1000              168
           10000             1229
          100000             9592
         1000000            78498

2.2.2 HPF Compilation for Distributed Memory Machines using MPI

The compile driver adaptor drives the whole compilation and will generate a corresponding executable. The compile driver is described in detail in section 5.

    adaptor -hpf -dm -o <executable> <file>.hpf

This call will generate the executable in three steps:

In the first step, the data parallel program will be translated to a Fortran program by the source-to-source translation fadapt (see Section 4).
The generated Fortran program will be compiled by a native Fortran Compiler. Since options for this compilation are compiler dependent, we refer to the corresponding Users Guide.
The compiled program will be linked with the HPF runtime system DALIB and the corresponding message passing library.

The following commands will generate a parallel SPMD program and run it on 4 processors (MPI installation must be available).

     adaptor -hpf -dm -o prime prime.hpf
     mpirun -np 4 prime

The basic idea of the HPF compilation for distributed memory machines is that the abstract HPF processors will be identified with MPI processes where usually one MPI process runs on one processor. The number of abstract HPF processors is given implictly at runtime by the number of available MPI processes (MPI_Comm_size).

2.2.3 HPF Compilation for a Single Node

The following command translates the data parallel program to a serial Fortran program, and then compiles and links it.

     adaptor -hpf_1 -o prime prime.hpf

The flag -hpf_1 specifies to identify all abstract HPF processors with one single processor. The program runs only on a single node, it does not use any explicit process or thread parallelism.

     ./prime           ! runs the executable prime

Note: Compiling for a single node results in slightly better performance than running the MPI program on a single node as certain overhead will be avoided by the fact that it is already known at compile time that only one processor will be used.

2.2.4 HPF Compilation for Shared Memory Machines using PThreads

The following command translates the data parallel HPF program to a Fortran program using shared memory parallelism via PThreads.

     adaptor -hpf -sm -o prime prime.hpf

The flag -sm tells the compiler to identify the abstract HPF processors with threads.

     setenv OMP_NUM_THREADS 4    !  export OMP_NUM_THREADS=4
     ./prime                     !  run it with 4 threads

2.2.5 HPF Compilation for Cache Architectures

This new execution model of HPF is based on the idea that one physical processor emulates a certain number of abstract processors. Due to the implicit blocking of data and load by the HPF mapping directives, the data of one abstract processor fits better in the cache and is better reused.

     adaptor -hpf -cb -o prime prime.hpf

The number of abstract processors (number of blocks) is specified by a flag.

     ./prime -nb 100             !  emulation of 100 abstract processors

2.2.6 HPF Compilation for a Cluster of SMP Nodes using MPI and PThreads

The following command translates the data parallel HPF program to a parallel programs using process parallelism via MPI and thread parallelism via PThreads.

     adaptor -hpf -dm -sm -o prime prime.hpf

The following commands will run the program on four nodes with two threads on each node.

     export OMP_NUM_THREADS=2
     mpirun -np 4 prime

2.3 OpenMP Compilation with ADAPTOR

Beside the HPF compilation, ADAPTOR can also be used to translate parallel OpenMP Fortran programs to programs using explicit thread parallelism.

2.3.1 The OpenMP Example Program

      program PRIME
      integer n, s, k
      logical, allocatable :: a(:)
!hpf$ distribute a(block)

      integer tickstart, tickstop, tickrate
      integer nt, omp_get_max_threads
      print *, 'Input n for counting primes in range 2 to n : '
      read *, n
      call system_clock (tickstart, tickrate)
      allocate (a(1:n))
      a = .true.
      a(1) = .false.
      s = 0
      nt = omp_get_max_threads ()
!$omp parallel private (i,k), reduction (+:s)
      k = 2
      do while (k*k .le. n)
c        /* sieve all multiples of k */
!$omp do
         do i = k*k, n, k
            a(i) = .false.
         end do
         k = k + 1
c        /* find next prime k */
         do while (.not. a(k))
           k = k + 1
         end do
      end do
!$omp do
      do i = 1, n
         if (a(i)) s = s + 1
      end do
!$omp end parallel
      call system_clock (tickstop)
      tickstop = tickstop - tickstart
      print *, 'Program runs on ', nt, ' threads'
      print *, 'There are ',s,' primes until ', n
      print *, 'Time needed : ', float(tickstop)/float(tickrate)
      deallocate (a)
      end

2.3.2 OpenMP Compilation for Shared Memory Machines using PThreads

The following command translates the parallel OpenMP program to a Fortran program using shared memory parallelism via PThreads.

     adaptor -openmp -o prime prime.f

The flag -openmp tells the compiler to translate the OpenMP directives and to bind the correct runtime system.

     setenv OMP_NUM_THREADS 4    !  export OMP_NUM_THREADS=4
     ./prime                     !  run it with 4 threads

2.4 Profiling with ADAPTOR

ADAPTOR provides support for profiling all kind of Fortran programs (a detailed description can be found in the ADAPTOR profiling guide [Bra04b]).

2.4.1 Instrumentation

Every OpenMP and HPF program that has been translated by ADAPTOR will be instrumented automatically.

Serial Fortran programs can be instrumented by the following command:

      adaptor -instr prime.f

OpenMP Fortran programs:

      adaptor -instr_openmp prime.f

MPI Fortran programs:

      adaptor -instr_mpi prime.f

Every program that has been translated by the adaptor compiler driver and has run through the source-to-source translation fadapt will be instrumented automatically. At least there is a runtime system call at every entry and exit of a subprogram. Therefore the runtime system can react on user-specific commands during execution to provide useful information.

By the switch -trace or by setting the environment variable TRACE every entry and exit of an instrumented region (by default all subprograms) will be traced. By default, this is a line on the standard output.

                                      export TRACE=1
  prime -trace                        prime

  call of PRIME (file=prime.hpf,line=3)
  end call of PRIME (file=prime.hpf,line=31)

For parallel compiled programs every processor will identify its own regions.

                                         export TRACE=1
     mpirun -np 2 prime -trace           mpirun -np 2 prime

1/2: call of PRIME (file=prime.hpf,line=3)
1/2: end call of PRIME (file=prime.hpf,line=31)
2/2: call of PRIME (file=prime.hpf,line=3)
2/2: end call of PRIME (file=prime.hpf,line=31)

By the switch -pm or by setting the environment variable PM the runtime collects the information how many time (walltime) has been spent in every region.

                                      export PM=1
  prime -pm                           prime

  profile data in pm.out for tid=0, nid=0

Counters for all regions
========================
PRIME (SUBPROGRAM,file=prime.hpf,lines=3:31)
                 calls     :              1
           walltime(B)  s  :          1.470
           walltime(N)  s  :          1.470

Next: 3 Configuration Files Up: ADAPTOR Users Guide Previous: 1 Introduction Contents Index

Thomas Brandes 2004-03-19