Dear all,
I have reported this issue somewhere in 2021/2022 in the OpenACC slack channel, and I was told that a ticket at NVIDIA was opened, but just in case I would like to post it here, to be sure that this is a known issue and perhaps to get an update on the status, as a collaborator has been bitten by this problem today.
The second loop in the code below does not seem to be running properly, showing out of bounds errors when the code is run with compute-sanitizer:
! cat test.f90
program p
integer, parameter, dimension(3) :: n = [120,120,33]
real, allocatable, dimension(:,:,:) :: arr_d
integer :: i,j,k
integer :: lx,ly,lz,hx,hy,hz
allocate(arr_d(n(1),n(2),n(3)))
lx=lbound(arr_d,1)
ly=lbound(arr_d,2)
lz=lbound(arr_d,3)
hx=ubound(arr_d,1)
hy=ubound(arr_d,2)
hz=ubound(arr_d,3)
print*,"works"
!$acc parallel loop collapse(3)
do k=lz,hz
do j=ly,hy
do i=lx,hx
arr_d(i,j,k) = 1.*i*j*k
enddo
enddo
enddo
print*,"fails"
!$acc parallel loop collapse(3)
do k=lbound(arr_d,3),ubound(arr_d,3)
do j=lbound(arr_d,2),ubound(arr_d,2)
do i=lbound(arr_d,1),ubound(arr_d,1)
arr_d(i,j,k) = 1.*i*j*k
enddo
enddo
enddo
end program p
nvfortran --version && nvfortran -acc -Minfo=accel test.f90 -o test && compute-sanitizer ./test
nvfortran 24.3-0 64-bit target on x86-64 Linux -tp cascadelake
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
p:
14, Generating implicit firstprivate(hz,hx,lx,hy,ly,lz)
Generating NVIDIA GPU code
15, !$acc loop gang, vector(128) collapse(3) ! blockidx%x threadidx%x
16, ! blockidx%x threadidx%x collapsed
17, ! blockidx%x threadidx%x collapsed
14, Generating implicit copyout(arr_d(1:hx,1:hy,1:hz)) [if not already present]
23, Generating NVIDIA GPU code
24, !$acc loop gang, vector(128) collapse(3) ! blockidx%x threadidx%x
25, ! blockidx%x threadidx%x collapsed
26, ! blockidx%x threadidx%x collapsed
23, Generating implicit copyout(arr_d(:,:,1:?)) [if not already present]
/usr/bin/ld: warning: /tmp/pgcudafatIC4icMdrzrktf.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
========= COMPUTE-SANITIZER
works
fails
========= Invalid __global__ write of size 4 bytes
========= at p_23+0x220 in /home/pedro/Desktop/test.f90:27
========= by thread (64,0,0) in block (0,0,0)
========= Address 0x79a55eff1d1c is out of bounds
========= and is 58,084 bytes before the nearest allocation at 0x79a55f000000 of size 1,901,056 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x332560]
========= in /usr/lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:__pgi_uacc_cuda_launch3 in ../../src/cuda_launch.c:840 [0x190e1]
========= in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
========= Host Frame:__pgi_uacc_cuda_launch in ../../src/cuda_launche.c:197 [0x1b850]
========= in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
========= Host Frame:__pgi_uacc_launch in ../../src/launch.c:55 [0x3b741]
========= in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libacchost.so
========= Host Frame:MAIN_ in /home/pedro/Desktop/test.f90:23 [0x2bbe]
========= in /home/pedro/Desktop/./test
========= Host Frame:main [0x2331]
========= in /home/pedro/Desktop/./test
========= Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:74 [0x2a1ca]
========= in /usr/lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main in ../csu/libc-start.c:347 [0x2a28b]
========= in /usr/lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0x2225]
========= in /home/pedro/Desktop/./test
=========
========= Invalid __global__ write of size 4 bytes
========= at p_23+0x220 in /home/pedro/Desktop/test.f90:27
========= by thread (65,0,0) in block (0,0,0)
========= Address 0x79a55eff1d1c is out of bounds
========= and is 58,084 bytes before the nearest allocation at 0x79a55f000000 of size 1,901,056 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x332560]
========= in /usr/lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:__pgi_uacc_cuda_launch3 in ../../src/cuda_launch.c:840 [0x190e1]
========= in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
========= Host Frame:__pgi_uacc_cuda_launch in ../../src/cuda_launche.c:197 [0x1b850]
========= in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libaccdevice.so
========= Host Frame:__pgi_uacc_launch in ../../src/launch.c:55 [0x3b741]
========= in /home/pedro/software/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib/libacchost.so
========= Host Frame:MAIN_ in /home/pedro/Desktop/test.f90:23 [0x2bbe]
========= in /home/pedro/Desktop/./test
========= Host Frame:main [0x2331]
========= in /home/pedro/Desktop/./test
========= Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:74 [0x2a1ca]
========= in /usr/lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:__libc_start_main in ../csu/libc-start.c:347 [0x2a28b]
========= in /usr/lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0x2225]
========= in /home/pedro/Desktop/./test
(...)
Thanks in advance!
Pedro