Out of bounds issue when using `lbound/ubound` inside an OpenACC kernel with `nvfortran`

Dear all,

I have reported this issue somewhere in 2021/2022 in the OpenACC slack channel, and I was told that a ticket at NVIDIA was opened, but just in case I would like to post it here, to be sure that this is a known issue and perhaps to get an update on the status, as a collaborator has been bitten by this problem today.

The second loop in the code below does not seem to be running properly, showing out of bounds errors when the code is run with compute-sanitizer:

! cat test.f90                      
program p
  integer, parameter, dimension(3) :: n = [120,120,33]
  real, allocatable, dimension(:,:,:) :: arr_d
  integer :: i,j,k
  integer :: lx,ly,lz,hx,hy,hz
  allocate(arr_d(n(1),n(2),n(3)))
  lx=lbound(arr_d,1)
  ly=lbound(arr_d,2)
  lz=lbound(arr_d,3)
  hx=ubound(arr_d,1)
  hy=ubound(arr_d,2)
  hz=ubound(arr_d,3)
  print*,"works"
  !$acc parallel loop collapse(3)
  do k=lz,hz
    do j=ly,hy
      do i=lx,hx
        arr_d(i,j,k) = 1.*i*j*k
      enddo
    enddo
  enddo
  print*,"fails"
  !$acc parallel loop collapse(3)
  do k=lbound(arr_d,3),ubound(arr_d,3)
    do j=lbound(arr_d,2),ubound(arr_d,2)
      do i=lbound(arr_d,1),ubound(arr_d,1)
        arr_d(i,j,k) = 1.*i*j*k
      enddo
    enddo
  enddo
end program p
nvfortran --version && nvfortran -acc -Minfo=accel test.f90 -o test && compute-sanitizer ./test

nvfortran 24.3-0 64-bit target on x86-64 Linux -tp cascadelake 
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
p:
     14, Generating implicit firstprivate(hz,hx,lx,hy,ly,lz)
         Generating NVIDIA GPU code
         15, !$acc loop gang, vector(128) collapse(3) ! blockidx%x threadidx%x
         16,   ! blockidx%x threadidx%x collapsed
         17,   ! blockidx%x threadidx%x collapsed
     14, Generating implicit copyout(arr_d(1:hx,1:hy,1:hz)) [if not already present]
     23, Generating NVIDIA GPU code
         24, !$acc loop gang, vector(128) collapse(3) ! blockidx%x threadidx%x
         25,   ! blockidx%x threadidx%x collapsed
         26,   ! blockidx%x threadidx%x collapsed
     23, Generating implicit copyout(arr_d(:,:,1:?)) [if not already present]
/usr/bin/ld: warning: /tmp/pgcudafatIC4icMdrzrktf.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
========= COMPUTE-SANITIZER
 works
 fails
========= Invalid __global__ write of size 4 bytes
=========     at p_23+0x220 in /home/pedro/Desktop/test.f90:27
=========     by thread (64,0,0) in block (0,0,0)
=========     Address 0x79a55eff1d1c is out of bounds
=========     and is 58,084 bytes before the nearest allocation at 0x79a55f000000 of size 1,901,056 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x332560]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:__pgi_uacc_cuda_launch3 in ../../src/cuda_launch.c:840 [0x190e1]
=========                in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
=========     Host Frame:__pgi_uacc_cuda_launch in ../../src/cuda_launche.c:197 [0x1b850]
=========                in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
=========     Host Frame:__pgi_uacc_launch in ../../src/launch.c:55 [0x3b741]
=========                in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libacchost.so
=========     Host Frame:MAIN_ in /home/pedro/Desktop/test.f90:23 [0x2bbe]
=========                in /home/pedro/Desktop/./test
=========     Host Frame:main [0x2331]
=========                in /home/pedro/Desktop/./test
=========     Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:74 [0x2a1ca]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main in ../csu/libc-start.c:347 [0x2a28b]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0x2225]
=========                in /home/pedro/Desktop/./test
========= 
========= Invalid __global__ write of size 4 bytes
=========     at p_23+0x220 in /home/pedro/Desktop/test.f90:27
=========     by thread (65,0,0) in block (0,0,0)
=========     Address 0x79a55eff1d1c is out of bounds
=========     and is 58,084 bytes before the nearest allocation at 0x79a55f000000 of size 1,901,056 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x332560]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:__pgi_uacc_cuda_launch3 in ../../src/cuda_launch.c:840 [0x190e1]
=========                in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
=========     Host Frame:__pgi_uacc_cuda_launch in ../../src/cuda_launche.c:197 [0x1b850]
=========                in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
=========     Host Frame:__pgi_uacc_launch in ../../src/launch.c:55 [0x3b741]
=========                in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libacchost.so
=========     Host Frame:MAIN_ in /home/pedro/Desktop/test.f90:23 [0x2bbe]
=========                in /home/pedro/Desktop/./test
=========     Host Frame:main [0x2331]
=========                in /home/pedro/Desktop/./test
=========     Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:74 [0x2a1ca]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main in ../csu/libc-start.c:347 [0x2a28b]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0x2225]
=========                in /home/pedro/Desktop/./test

(...)

Thanks in advance!

Pedro

1 Like

Hi Pedro,

I found the conversation on Slack and see that Matt did file this as NVBUG 3615772, which is being tracked by our team as TPR#32567.

Looks like management made it a low priority item given there’s an easy work around and it’s an uncommon pattern. Though given the age of the issue, I’ve asked if we can bump up the priority.

Thanks,
Mat

1 Like

Hi Mat,

Thanks for confirming and for the update. Nice to know it may be given higher priority.

Best,
Pedro