Nvfortran -stdpar triggers OpenACC directives to be evaluated

The -stdpar option of nvfortran generates code from OpenACC regions, although no -acc option is specified. I have a test code using three different kinds of parallelization methods:

program test
  integer, parameter :: N = 1000
  integer, dimension(N) :: x
  integer :: i

!$omp target teams distribute parallel do
  do i = 1, N
    x(i) = 1
  end do

!$acc parallel loop
  do i = 1, N
    x(i) = 1
  end do

  do concurrent (i=1:n)
    x(i) = 1
  end do

  print *, x(1), x(N/2)
end program test

The print statements are apparently necessary to avoid the loops to be optimized away. I compile it in three different ways:

(ninja) [cweiss@gpu001 tmp]$ nvfortran -stdpar=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
      7, Recognized memory set idiom
     11, Generating NVIDIA GPU code
         12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     11, Generating implicit copyout(x(:)) [if not already present]
     16, Generating NVIDIA GPU code
         16, Loop parallelized across CUDA thread blocks, CUDA threads(128) blockidx%x threadidx%x
     16, Generating implicit copyout(x(:)) [if not already present]
(ninja) [cweiss@gpu001 tmp]$ nvfortran -acc=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
      7, Recognized memory set idiom
     11, Generating NVIDIA GPU code
         12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     11, Generating implicit copyout(x(:)) [if not already present]
(ninja) [cweiss@gpu001 tmp]$ nvfortran -mp=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
      6, !$omp target teams distribute parallel do
          6, Generating "nvkernel_MAIN__F1L6_2" GPU kernel
      6, Generating implicit map(tofrom:x(:)) 

As a reference, the compilation with neither of these options yields no output from -Minfo.
The compilation with -stdpar should only address the last loop, but -Minfo shows that also the second one is used to generate OpenACC code. On the other hand, -acc=gpu behaves as expected. However, both compilations give the message Recognized memory set idiom for the OpenMP region, which does not happen in the pure CPU build. Finally, the OpenMP compilation is not conspicuous.
According to my understanding, -stdpar=gpu should leave the OpenACC regions unaffected, or do I understand it wrong?

Our Fortran STDPAR implementation is built on-top of OpenACC so is enabled by default.

However, both compilations give the message Recognized memory set idiom for the OpenMP region, which does not happen in the pure CPU build.

This is due to the optimization level being applied. OpenACC/STDPAR uses -O2 to enable the auto-parallelization analysis. -O2 also enables idiom recognition.

If you add -O2 to the last build, you’ll see the idiom message for the second loop.

1 Like