ATOMIC operations on pointers arrays (OpenACC Fortran)

Hi,

I recently have being debugging an OpenACC application with HPC-SDK 21.3 package and above. Usually my development is being done on 20.11 so i never come across this issue until now. Fortunately, I manage to isolate the bug to an atomic operation that is being done on an integer array with a pointer attribute.
A simple reproducer can be found below

   subroutine atomic_int(add)
   implicit none
   integer add
   integer i,j,k
   integer n,n2,ns
   integer,pointer :: idat(:)
 
   n  = 1000
   n2 = 64
   ns = 20
   allocate (idat(ns))
 
!$acc parallel loop default(present) copy(idat)
   do i = 1,n
      j = mod(i,10)
!$acc loop vector
      do k = 1,n2
!$acc atomic
         idat(j) = idat(j) + add
      end do
   end do

12 format(I7,$)
   write(*,12) (idat(i),i=1,ns)
   write(*,*)

   deallocate (idat)
   end subroutine

   program main
   implicit none
   call atomic_int(2)
   end program

This program behaves as follow with SDK 20.11 package

user@host:~/train nvfortran -acc -Minfo=accel -o test_int test.int.f90
test_int:
     22, Accelerator serial kernel generated
         Generating Tesla code
atomic_int:
     63, Generating copy(idat(:)) [if not already present]
         Generating Tesla code
         64, !$acc loop gang ! blockidx%x
         67, !$acc loop vector(128) ! threadidx%x
     67, Loop is parallelizable
user@host:~/train ./test_int 
12800  12800  12800  12800  12800  12800  12800  12800  12800      0      0      0      0      0      0      0      0      0      0      0

However with HPC-SDK 21.[3-5], I end up with

user@host:~/train nvfortran -acc -Minfo=accel -o test_int test.int.f90                                                                            
test_int:                                                                      
     22, Accelerator serial kernel generated                                   
         Generating Tesla code                                                 
atomic_int:                                                                    
     63, Generating implicit create(idat) [if not already present]             
         Generating copy(idat(:)) [if not already present]                     
         Generating Tesla code                                                 
         64, !$acc loop gang ! blockidx%x                                      
         67, !$acc loop vector(64) ! threadidx%x                               
     67, Loop is parallelizable                                                
user@host:~/train ./test_int                                       
hostptr=0x7ffce56a6758,eltsize=4,name=idat$p,flags=0x20000200=present+implicit,async=-1,threadid=1                                                            
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 7.0, threadid=1                                                                      
host:0xed0030 device:0x2ada6fafa000 size:80 presentcount:1+0 line:63 name:idat                                                                                
allocated block device:0x2ada6fafa000 size:512 thread:1                        
FATAL ERROR: data in PRESENT clause was not found on device 1: name=idat$p host:0x7ffce56a6758                                                                
 file:/home/adjoua/train/test.int.f90 atomic_int line:12                       

I have discovered that the solution to overcome this issue is to change ndat attributes from pointer to allocatable. Even the allocatable,target attribute does not seem to work either. Now I am wandering why this restriction had to be applied on the latest versions of the package or maybe this is a bug in which case can it be send to developers in the next release. There seem to be an issue with OpenACC data manager 63,Generating implicit create(idat) [if not already present]. idat should not be processed.
Changing the variable attributes here is easy however I cannot do this in our application. idat must be a pointer since allocation procedure is being done by MPI library through MPI_Win_allocate_shared. also this pattern is widely through the code.

Any advice regarding this matter is appreciated
Regards

Hi passi1849,

We updated our atomics in 21.3 and that’s caused a few issues. Most have been fixed in later releases, but looks like this one is still an issue. Hence, I filed a problem report, TPR #30472, and sent it to engineering for review.

As a work around, you can use the internal compiler flag “-Mx,231,0x1” to revert to using the older atomics. Though once we get this fix in a release, you should remove this flag since the meaning of internal flags can change.

Thanks for the report!
Mat

Hi Mat

Thank you for your very quick and helpful reply. The compiler flag fixes the issue. I hope this one will be addressed in futures releases. Meanwhile, I will notify our users about this fix.

Best regards
passi1849