next up previous contents index
Next: 7 Problems Up: ADAPTOR Users Guide Previous: 5 The Compiler Driver   Contents   Index

Subsections

6 Execution of Parallel Programs

The execution of the executable generated by ADAPTOR depends on the chosen execution model.

6.1 Runtime Options

Some runtime options are provided to give some more information about the parallel HPF program during the runtime. This information can be very helpful for performance tuning of the HPF program. Table 9 in the appendix gives a summary of all available options. The flag -help prints also a summary of these runtime options.

  a.out -help

  <executable> [-info] [-trace] [-comm] [-pm[=file]
               [-nn |-nt |-nb  n1[xn2[xn3]] ]

Instead of using command line arguments it is also possible to set corresponding environment variables.

    setenv INFO     1|on|ON
    setenv PM       1|on|ON|<filename>
    setenv COMMSTAT 1|on|ON
    setenv TRACE    1|on|ON

6.1.1 Number of Node Processes

The default number of node processes is usually the number of processors that has been reserved for running the parallel application. If such a reservation does not exist, there will be no default value.

The number of nodes can also be specified by giving an explicit argument or by setting the environment variable NP.

     a.out -nn 12    ! will start 12 node processes

     setenv NN 12
     a.out           ! will start 12 node processes

If there is no default value and no explicit specification for the number of processes, this number will be interactively asked for until a legal value has been given as an input. This is also done if there was an illegal explicit specification, e.g. if the number of node processes is bigger than the number of available nodes.

Instead of a single number, it is also possible to define the number of processors as a product of two or three numbers. The corresponding processor shape will then be two- or three-dimensional.

    a.out -nn 3x4       ! starts 12 nodes as a 3x4 processor topology
    a.out -nn 2x2x2     ! starts 8 node processes as a 2x2x2 processor topology

Also the actual mapping of arrays distributed in two or three dimensions onto the default processor array can be driven at runtime by specifying the number of processors as a product.

      real A(N,N)
!hpf$ distribute A(block,block)

      a.out -nn 1x4       ! only last dimension of A is distributed
      a.out -nn 4x1       ! only first dimension of A is distributed
      a.out -nn 2x2       ! both dimensions of A are distributed

6.1.2 Number of Threads

By default, the number of threads used for the SM parallelism is given by the number of processors of the SM machine or the number of processors of one node within a cluster.

The number of threads can also be specified by giving an explicit argument or by setting the environment variable NT.

     a.out -nt 2x2   ! uses 2x2 threads for SM parallelism

     setenv NT 4
     a.out           ! uses 4 threads for SM parallelism

The runtime option -nt overrides the value of the environment variable NT. The value of the enviroment variable NT overrides the value of the OpenMP environment variable OMP_NUM_THREADS. The value of the OpenMP environment variable overwrites the default number of threads that is usually the number of processors on the node.

6.1.3 Number of Blocks

HPF and OpenMP programs compiled with ADAPTOR can also utilize blocking (emulation of processes or threads on a single processor). By default, the number of blocks is only 1.

The number of blocks can specified by giving an explicit argument or by setting the environment variable NB.

     a.out -nb 4     ! emulate 4 processes/threads        

     setenv NB 4
     a.out           ! emulate 4 processes/threads

The runtime option -nb overrides the value of the environment variable NB.

6.1.4 Communication Statistics

The communication statistics (enabled by -comm or by setting the environment variable COMMSTAT) gives information about how many bytes of data have been sent and received between all processors. More detailled information is printed in the file <executable>.commstat.

       STATISTICS SUMMARY:
       Number of bytes sent by all procs:
        to:     P 1     P 2     P 3     P 4       all
        P 1    8000   31600   31200     400     71200
        P 2    8400       0     400       0      8800
        P 3    8000     400       0   31600     40000
        P 4    7600       0     400       0      8000


6.2 Running Executables for DM Machines

The Message-Passing Interface (MPI) standard has been introduced as a common standard for the message passing programming paradigm to achieve portability between different parallel machines. Since almost all major hardware vendors were actively involved in MPI, implementations are meanwhile available on virtually every parallel computer, including networks of workstations.

MPI programs usually are started by the mpirun command:

     mpirun -np <np> a.out


6.3 Running Executables for SM Machines

Parallel HPF executables generated by ADAPTOR for SM machines take advantage of thread parallelism via the PThreads library. They are started by directly calling the executable. The number of threads can be set via the OpenMP environment variable OPM_NUM_THREADS which will be taken as default for the ADAPTOR generated executable.

   setenv OMP_NUM_THREADS 2      export OMP_NUM_THREADS=2
   a.out

ADAPTOR has also its own runtime environment variable NT that overrides the default OpenMP environment variable.

   setenv NT 2                   export NT=2
   a.out

Instead of an environment variable the runtime option -nt can be used that will also override the value of the environment variable.

   a.out -nt 2


6.4 Running Executables for Clusters of SMPs

Executables that take advantage of DM and SM parallelism via message passing and threads are started like parallel programs for DM machines (see Section6.2). The number of threads used for the SM parallelism should be set via the ADAPTOR specific environment variable NT or via the runtime option -nt.

   setenv NP 4                  export NP=4
   setenv NT 2                  export NT=2
   a.out

   a.out -np 4 -nt 2


6.5 Parallel Execution of Coupled HPF Programs

ADAPTOR allows the concurrent execution of multiple HPF tasks that might be coupled via the new HPF_TASK_LIBRARY (see figure 4). Every HPF task can be an own HPF program.

Figure 4: Concurrent execution of data parallel tasks.
\includegraphics[height=50mm]{data_tasks.eps}

The coupling of multiple HPF programs is only possible when all node processes are started as one application (MPMD execution model). The node processes belonging to one HPF task will build a new subgroup that becomes the actual context of this task program. Every task and every processor will know all subgroups and so all available HPF tasks. This is needed for corresponding communication between the HPF tasks. Such routines are provided by the HPF_TASK_LIBRARY.

6.5.1 Generation of HPF Tasks

HPF programs that should become HPF tasks are compiled in the usual way. There is no special flag for compilation needed as the multiple execution of HPF tasks is completely hidden in the DALIB runtime system.

   adaptor -o STAGE1 STAGE1.HPF
   adaptor -o STAGE2 STAGE2.HPF
   adaptor -o STAGE3 STAGE3.HPF

The usual way to guarantee the parallel execution of HPF tasks is to create a task file (e.g. TASKS) that contains the corresponding task names and task sizes. Comment lines are allowed for this file. The environment variable TASK_FILE must be set with the name of the task file. This environment variable will be checked at runtime and the corresponding creation of HPF tasks initiated.

   setenv TASK_FILE TASKS

   # content of task file TASKS
   STAGE1  2
   STAGE2  3
   STAGE2  3
   STAGE3  2

For the given example, four HPF tasks should be executed simultaneously. There will be two different HPF tasks for STAGE2 with three processors for every task. The other two tasks, STAGE1 and STAGE3, will run on two processors in each case.

The parallel machine itself must provide a mechanism to start the different executables on the nodes of the parallel machine. The numbering and execution of the executables must correspond to then entries in the task file.

If the environment variable TASK_FILE has not been set, the DALIB will also build subgroups of different HPF tasks as soon as the loaded executables have different names. The algorithm for building subgroups is based on the following idea: all processors with the same name of the loaded program will form a new subgroup. Every such subgroup forms one HPF task. This allows the multiple execution of different HPF programs. If the same HPF program should run for different tasks, the executable must be renamed.

6.5.2 Running Coupled HPF Programs with MPI

Some MPI implementations allow the execution of different SPMD programs in the MPMD model. E.g., MPICH (portable implementation of MPI) is based on P4 and allows the definition of a processor group file (e.g. PGFILE).

  Machine01 0 STAGE1
  Machine02 1 STAGE1
  Machine03 1 STAGE2
  Machine04 1 STAGE2
  Machine05 1 STAGE2
  Machine06 1 STAGE2
  Machine07 1 STAGE2
  Machine08 1 STAGE2
  Machine08 1 STAGE3
  Machine09 1 STAGE3

The task file and the processor group file must define the tasks in the same order.

  setenv TASK_FILE TASKS
  mpirun -p4pg PGFILE STAGE1

Note: When starting the same processes without setting the environment variable TASK_FILE, the different executables will also be executed as HPF tasks. But in this case, all processes running STAGE2 will form one single HPF task running on 6 processors.


next up previous contents index
Next: 7 Problems Up: ADAPTOR Users Guide Previous: 5 The Compiler Driver   Contents   Index
Thomas Brandes 2004-03-19