NVFORTRAN/OpenACC fail to update created variables with CUDA11

I am using nvfortran to develop code using OpenACC. The global variables in a module are declared on device. I find that with the latest CUDA (11.4), these variables can only be updated on the first device and not on others.

Here is an example to reproduce this issue.

foo.f90

module foomod
integer :: var
!$acc declare create(var)

contains

subroutine foo
!$acc set device_num(0)
!$acc parallel loop
do i=1,2
   write(0,*) var
enddo
end subroutine

subroutine foo2
!$acc set device_num(1)
!$acc parallel loop
do i=1,2
   write(0,*) var
enddo
end subroutine

end module

test.f90

program test
use foomod

var=3
!$acc set device_num(0)
!$acc update device(var)
!$acc set device_num(1)
!$acc update device(var)

call foo
call foo2
write(0,*) var

end program

compile

nvfortran -c -fast -Minfo=accel -Mcuda -acc -ta=tesla:cuda11.4 test.f90
nvfortran -c -fast -Minfo=accel -Mcuda -acc -ta=tesla:cuda11.4 foo.f90
nvfortran -fast -Minfo=accel -Mcuda -acc -ta=tesla:cuda11.4 test.o foo.o

run

./a.out

Result

            3
            3
            0
            0
            3

With CUDA 10.2, the result is

            3
            3
            3
            3
            3

which is the correct one.

Thanks goducks777,

I was able to recreate the error here and have filed a problem report, TPR #30548.

My best guess is that there’s something amiss when using the CUDA Fortran device initialization with multiple devices. Though engineering will investigate.

Does your application use CUDA Fortran? If not, the work around is to remove the “-Mcuda” flag. Note that “-Mcuda” and “-ta=tesla” are deprecated and now called “-cuda” and “-gpu”.

% nvfortran -fast -acc foo.f90 -cuda -gpu=cuda11.0 -V21.7 ; a.out
            3
            3
            0
            0
            3
% nvfortran -fast -acc foo.f90 -gpu=cuda11.0 -V21.7 ; a.out
            3
            3
            3
            3
            3

-Mat

Thank you Mat. My code does not use CUDA Fortran so this workaround works for me. I hope this can be fixed soon.