Nvfortran -stdpar triggers OpenACC directives to be evaluated

christian.weiss · November 22, 2024, 12:30pm

The -stdpar option of nvfortran generates code from OpenACC regions, although no -acc option is specified. I have a test code using three different kinds of parallelization methods:

program test
  integer, parameter :: N = 1000
  integer, dimension(N) :: x
  integer :: i

!$omp target teams distribute parallel do
  do i = 1, N
    x(i) = 1
  end do

!$acc parallel loop
  do i = 1, N
    x(i) = 1
  end do

  do concurrent (i=1:n)
    x(i) = 1
  end do

  print *, x(1), x(N/2)
end program test

The print statements are apparently necessary to avoid the loops to be optimized away. I compile it in three different ways:

(ninja) [cweiss@gpu001 tmp]$ nvfortran -stdpar=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
      7, Recognized memory set idiom
     11, Generating NVIDIA GPU code
         12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     11, Generating implicit copyout(x(:)) [if not already present]
     16, Generating NVIDIA GPU code
         16, Loop parallelized across CUDA thread blocks, CUDA threads(128) blockidx%x threadidx%x
     16, Generating implicit copyout(x(:)) [if not already present]
(ninja) [cweiss@gpu001 tmp]$ nvfortran -acc=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
      7, Recognized memory set idiom
     11, Generating NVIDIA GPU code
         12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     11, Generating implicit copyout(x(:)) [if not already present]
(ninja) [cweiss@gpu001 tmp]$ nvfortran -mp=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
      6, !$omp target teams distribute parallel do
          6, Generating "nvkernel_MAIN__F1L6_2" GPU kernel
      6, Generating implicit map(tofrom:x(:))

As a reference, the compilation with neither of these options yields no output from -Minfo.
The compilation with -stdpar should only address the last loop, but -Minfo shows that also the second one is used to generate OpenACC code. On the other hand, -acc=gpu behaves as expected. However, both compilations give the message Recognized memory set idiom for the OpenMP region, which does not happen in the pure CPU build. Finally, the OpenMP compilation is not conspicuous.
According to my understanding, -stdpar=gpu should leave the OpenACC regions unaffected, or do I understand it wrong?

MatColgrove · November 22, 2024, 6:31pm

Our Fortran STDPAR implementation is built on-top of OpenACC so is enabled by default.

However, both compilations give the message Recognized memory set idiom for the OpenMP region, which does not happen in the pure CPU build.

This is due to the optimization level being applied. OpenACC/STDPAR uses -O2 to enable the auto-parallelization analysis. -O2 also enables idiom recognition.

If you add -O2 to the last build, you’ll see the idiom message for the second loop.

Topic		Replies	Views
Combining stdpar with OpenACC async nvc, nvc++ and nvfortran	1	449	April 27, 2023
Noacc flag nvc, nvc++ and nvfortran	2	636	September 14, 2022
Compiler error with 21.5 and OpenACC nvc, nvc++ and nvfortran	7	883	July 22, 2021
Under Nvfortran 25.3 -stdpar=gpu -acc=gpu -gpu=mem:separate -O3 is still slow nvc, nvc++ and nvfortran	6	120	July 26, 2025
Does StdPar speed up native loops? nvc, nvc++ and nvfortran	4	638	May 3, 2023
Accelerated Fortran stdpar code failing at runtime nvc, nvc++ and nvfortran	9	116	May 19, 2025
Implicit data copy to device for allocated arrays using compilation option -stdpar=gpu nvc, nvc++ and nvfortran	11	790	May 31, 2023
Does nvfortran -stdpar=gpu support two GPUs with NVLink? nvc, nvc++ and nvfortran	12	203	April 2, 2025
Is nvfortran able to compile code with both OPENMP and OPENACC active pragma? nvc, nvc++ and nvfortran	4	630	August 23, 2023
Questions about the -acc command Legacy PGI Compilers	3	454	January 6, 2024

Nvfortran -stdpar triggers OpenACC directives to be evaluated

Related topics