The -stdpar
option of nvfortran generates code from OpenACC regions, although no -acc
option is specified. I have a test code using three different kinds of parallelization methods:
program test
integer, parameter :: N = 1000
integer, dimension(N) :: x
integer :: i
!$omp target teams distribute parallel do
do i = 1, N
x(i) = 1
end do
!$acc parallel loop
do i = 1, N
x(i) = 1
end do
do concurrent (i=1:n)
x(i) = 1
end do
print *, x(1), x(N/2)
end program test
The print statements are apparently necessary to avoid the loops to be optimized away. I compile it in three different ways:
(ninja) [cweiss@gpu001 tmp]$ nvfortran -stdpar=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
7, Recognized memory set idiom
11, Generating NVIDIA GPU code
12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
11, Generating implicit copyout(x(:)) [if not already present]
16, Generating NVIDIA GPU code
16, Loop parallelized across CUDA thread blocks, CUDA threads(128) blockidx%x threadidx%x
16, Generating implicit copyout(x(:)) [if not already present]
(ninja) [cweiss@gpu001 tmp]$ nvfortran -acc=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
7, Recognized memory set idiom
11, Generating NVIDIA GPU code
12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
11, Generating implicit copyout(x(:)) [if not already present]
(ninja) [cweiss@gpu001 tmp]$ nvfortran -mp=gpu -Minfo=all test_stdpar.f90 -o test.x
test:
6, !$omp target teams distribute parallel do
6, Generating "nvkernel_MAIN__F1L6_2" GPU kernel
6, Generating implicit map(tofrom:x(:))
As a reference, the compilation with neither of these options yields no output from -Minfo
.
The compilation with -stdpar
should only address the last loop, but -Minfo
shows that also the second one is used to generate OpenACC code. On the other hand, -acc=gpu
behaves as expected. However, both compilations give the message Recognized memory set idiom
for the OpenMP region, which does not happen in the pure CPU build. Finally, the OpenMP compilation is not conspicuous.
According to my understanding, -stdpar=gpu
should leave the OpenACC regions unaffected, or do I understand it wrong?