Hi,
I recently have being debugging an OpenACC application with HPC-SDK 21.3 package and above. Usually my development is being done on 20.11 so i never come across this issue until now. Fortunately, I manage to isolate the bug to an atomic operation that is being done on an integer array with a pointer attribute.
A simple reproducer can be found below
subroutine atomic_int(add)
implicit none
integer add
integer i,j,k
integer n,n2,ns
integer,pointer :: idat(:)
n = 1000
n2 = 64
ns = 20
allocate (idat(ns))
!$acc parallel loop default(present) copy(idat)
do i = 1,n
j = mod(i,10)
!$acc loop vector
do k = 1,n2
!$acc atomic
idat(j) = idat(j) + add
end do
end do
12 format(I7,$)
write(*,12) (idat(i),i=1,ns)
write(*,*)
deallocate (idat)
end subroutine
program main
implicit none
call atomic_int(2)
end program
This program behaves as follow with SDK 20.11 package
user@host:~/train nvfortran -acc -Minfo=accel -o test_int test.int.f90
test_int:
22, Accelerator serial kernel generated
Generating Tesla code
atomic_int:
63, Generating copy(idat(:)) [if not already present]
Generating Tesla code
64, !$acc loop gang ! blockidx%x
67, !$acc loop vector(128) ! threadidx%x
67, Loop is parallelizable
user@host:~/train ./test_int
12800 12800 12800 12800 12800 12800 12800 12800 12800 0 0 0 0 0 0 0 0 0 0 0
However with HPC-SDK 21.[3-5], I end up with
user@host:~/train nvfortran -acc -Minfo=accel -o test_int test.int.f90
test_int:
22, Accelerator serial kernel generated
Generating Tesla code
atomic_int:
63, Generating implicit create(idat) [if not already present]
Generating copy(idat(:)) [if not already present]
Generating Tesla code
64, !$acc loop gang ! blockidx%x
67, !$acc loop vector(64) ! threadidx%x
67, Loop is parallelizable
user@host:~/train ./test_int
hostptr=0x7ffce56a6758,eltsize=4,name=idat$p,flags=0x20000200=present+implicit,async=-1,threadid=1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 7.0, threadid=1
host:0xed0030 device:0x2ada6fafa000 size:80 presentcount:1+0 line:63 name:idat
allocated block device:0x2ada6fafa000 size:512 thread:1
FATAL ERROR: data in PRESENT clause was not found on device 1: name=idat$p host:0x7ffce56a6758
file:/home/adjoua/train/test.int.f90 atomic_int line:12
I have discovered that the solution to overcome this issue is to change ndat attributes from pointer to allocatable. Even the allocatable,target attribute does not seem to work either. Now I am wandering why this restriction had to be applied on the latest versions of the package or maybe this is a bug in which case can it be send to developers in the next release. There seem to be an issue with OpenACC data manager 63,Generating implicit create(idat) [if not already present]. idat should not be processed.
Changing the variable attributes here is easy however I cannot do this in our application. idat must be a pointer since allocation procedure is being done by MPI library through MPI_Win_allocate_shared. also this pattern is widely through the code.
Any advice regarding this matter is appreciated
Regards