I am using nvfortran to develop code using OpenACC. The global variables in a module are declared on device. I find that with the latest CUDA (11.4), these variables can only be updated on the first device and not on others.
Here is an example to reproduce this issue.
foo.f90
module foomod
integer :: var
!$acc declare create(var)
contains
subroutine foo
!$acc set device_num(0)
!$acc parallel loop
do i=1,2
write(0,*) var
enddo
end subroutine
subroutine foo2
!$acc set device_num(1)
!$acc parallel loop
do i=1,2
write(0,*) var
enddo
end subroutine
end module
test.f90
program test
use foomod
var=3
!$acc set device_num(0)
!$acc update device(var)
!$acc set device_num(1)
!$acc update device(var)
call foo
call foo2
write(0,*) var
end program
compile
nvfortran -c -fast -Minfo=accel -Mcuda -acc -ta=tesla:cuda11.4 test.f90
nvfortran -c -fast -Minfo=accel -Mcuda -acc -ta=tesla:cuda11.4 foo.f90
nvfortran -fast -Minfo=accel -Mcuda -acc -ta=tesla:cuda11.4 test.o foo.o
run
./a.out
Result
3
3
0
0
3
With CUDA 10.2, the result is
3
3
3
3
3
which is the correct one.