This section describes in more detail how ADAPTOR compiles OpenMP programs.
The following example shows a parallel region.
!$OMP parallel private (INODE) INODE = omp_get_thread_num() write (6,'('' hello world from '',I3)') INODE !$OMP end parallel
The general idea of the ADAPTOR translation is to create a new subroutine that will be called by all the threads of a team. The team itself will be created by the runtime function DALIB_pthreads.
! runtime call to create threads executing HELLO1 call DALIB_pthreads (HELLO1, ...)
subroutine HELLO1 () integer INODE external DALIB_get_thread_num integer DALIB_get_thread_num INODE = DALIB_get_thread_num() write (6,'('' hello world from '',I3)') INODE end subroutine HELLO1
Hint: ADAPTOR supports nested parallelism.
C$OMP PARALLEL DO PRIVATE (v), SHARED(w) C$OMP+REDUCTION (+:gsum) DO i = 1, n v = (i - 0.5d0 ) * w v = 4.0d0 / (1.0d0 + v * v) gsum = gsum + v END DO
call DALIB_pthreads (CALC_PI1,DALIB_0,DALIB_0,6,GSUM,W,N,DALIB_0,D &ALIB_0,DALIB_0)
subroutine CALC_PI1 (GSUM, W, N, ...) double precision GSUM_TMP, V, W, GSUM integer I, N integer IK_STOP1 integer IK_START1 GSUM_TMP = 0.0 call DALIB_do_static_bsched (1,N,1,IK_START1,IK_STOP1) do I=IK_START1,IK_STOP1 V = (REAL(I,8)-0.5d0)*W V = 4.0d0/(1.0d0+V*V) GSUM_TMP = GSUM_TMP+V end do call DALIB_enter_critical () GSUM = GSUM+GSUM_TMP call DALIB_leave_critical ()
!$omp do schedule (STATIC) do I = 1, N, IS A(I) = IT + 1 end do
call DALIB_do_static_bsched (1,N,IS,IK_START1,IK_STOP1) do I=IK_START1,IK_STOP1,IS A(A_ZERO+I) = IT+1 end do
!$omp do schedule (dynamic, ICHUNK), lastprivate (PLAST) do I = 1, N, IS A(I) = IT + 1 PLAST = I end do
call DALIB_do_dynamic_sched_init (1,N,IS,ICHUNK) do while (DALIB_do_dynamic_sched_next(IK_START1,IK_STOP1)) do I=IK_START1,IK_STOP1,IS A(A_ZERO+I) = IT+1 PLAST_TMP = I end do end do if (DALIB_is_mp_last()) then PLAST = PLAST_TMP end if
Each section is executed by a different thread/processor in a dynamic manner. Internally, parallel sections are handled in the following ways:
!$omp parallel private (IT) IT = OMP_GET_THREAD_NUM () !$omp sections !$omp section call XAXIS (AXIS) !$omp section call YAXIS (AXIS) !$omp section call ZAXIS (AXIS) !$omp end sections !$omp end parallel
!$omp do dynamic do I0 = 1, 3 if (I0 .eq. 1) then call XAXIS (AXIS,AXIS_DSP) else if (I0 .eq. 2) then call YAXIS (AXIS,AXIS_DSP) else if (I0 .eq. 3) then call ZAXIS (AXIS,AXIS_DSP) end if end do
The code within a SINGLE region will be executed by only one thread.
!$omp single X = X + 10 Y = Y + 100 !$omp end single
call DALIB_mp_barrier () if (DALIB_is_mp_single()) then X = X+10 Y = Y+100 end if call DALIB_mp_barrier ()
Only one thread does the work, but it is unspecified which thread actually it is.
!$omp atomic X(INDEX(I)) = X(INDEX(I)) + XLOCAL
A critical section is enclosed with lock primitives.
!$omp atomic X(INDEX(I)) = X(INDEX(I)) + XLOCAL
The general strategy is similar to the critical section. The assignment will be enclosed with lock primitives that are less restrictive.
call DALIB_atomic_lock (X(INDEX(I))) X(INDEX(I)) = X(INDEX(I))+XLOCAL call DALIB_atomic_unlock (X(INDEX(I)))
!$omp do ordered do I = 2, N call WORK (I, X, N) end do
call DALIB_init_ordered (2,1) call DALIB_do_static_bsched (2,N,1,IK_START1,IK_STOP1) do I=IK_START1,IK_STOP1 call DALIB_set_loop_id (I) call WORK (I,X(X_ZERO+1),N,DALIB_0,X_DSP,DALIB_0) end do
subroutine WORK (I, X, N) integer I integer N integer X (N) !$omp ordered X (I) = X(I-1) + I !$omp end ordered end
call DALIB_enter_ordered () X(X_ZERO+I) = X(X_ZERO+(I-1))+I call DALIB_leave_ordered ()