nested openmp support in pgf90

I have a large code that I am parallelizing with MPI + OpenMP + CUDA-Fortran. I am currently using v12.10 of pgf90, though other versions starting from 10.x have been used.

Different parts of the code is using nested openmp, and while it works fine with other compilers (ifort, gfortran, xlf), it seems that pgf90 refuses any nested openmp.

For example, here is a snippet illustrating the use of nested loop parallelization :

REAL FUNCTION wallclock()
  integer, save:: count(2), count_rate=0
  real, save:: norm, offset=0.
  if (count_rate == 0) then
    call system_clock(count=count(1), count_rate=count_rate)
    norm=1./real(count_rate)
  end if
  call system_clock(count=count(2))
  wallclock = (count(2)-count(1))*norm
  if (wallclock < 0.) then
    offset = offset + 24.*3600.
    wallclock = wallclock + 24.*3600.
  end if
END FUNCTION wallclock

Program Test_Nested_OpenMP
  implicit none
  integer, parameter     :: n=80000000
  integer                :: i, j
  integer, dimension(:,:), allocatable :: a, b
  real                   :: t0,t1,t2
  real, external         :: wallclock

  allocate(a(n,2), b(n,2))
  a=0; b=0
  t0 = wallclock()
  !$omp parallel do collapse(2)
  do j=1,2
  do i=1,n
    a(i,j)=sin(real(i+j))
  enddo
  enddo

  t1 = wallclock()  
  print *, 'Number of elements                  :', n
  print *, 'Time to initialize array            :', t1-t0  
  print *, '----------------------------------------------------' 

  !$omp parallel do num_threads(2) shared(a,b) private(i,j)
  do j=1,2

    !$omp parallel shared(a,b,j) private(i)
    !$omp do
    do i=1,n
      b(i,j) = sin(real(i+j))
    enddo
    !$omp enddo nowait
    !$omp end parallel

  enddo
  !$omp end parallel do

  t2 = wallclock()  
  print *, 'Time to do nested region            :', t2-t1  
END

Compiling and Executing with :

$ pgf90 -O2 -mp -Minfo test_nested_openmp.f90
test_nested_openmp:
26, Memory zero idiom, array assignment replaced by call to pgf90_mzero4
28, Parallel region activated
30, Parallel loop activated with static block schedule
33, Parallel region terminated
40, Parallel region activated
41, Parallel loop activated with static block schedule
43, Parallel region activated
47, Parallel region terminated
51, Parallel region terminated
$ env OMP_NUM_THREADS=4 OMP_MAX_ACTIVE_LEVELS=2 OMP_NESTED=true OMP_DYNAMC=true OMP_THREAD_LIMIT=4 taskset -c 0-3 ./a.out

I get

Number of elements : 80000000
Time to initialize array : 2.112338

Time to do nested region : 3.674445

With other compilers (xlf, ifort, gfortran) the two times are equal. I have tried almost any variation of the OMP environment variables to no avail.

Is nested OpenMP not - or only partially - supported by the PGI compilers ?

best,

Troels

Hi Troels,

Nested parallelism is support when the parallel regions are not lexically nested, i.e. the second comes inside a function call.

  • Mat