Hi, I’m working with OpenACC’s routines feature on a more complicated piece of code and I’m facing with several runtime errors. Back to basics, I came across this toy code using the latest version of nvhpc I have available, but I’m getting errors. What could it be?
$ cat repro.F90
program openacc_subroutine
implicit none
integer, parameter :: N = 256, M = 128
real, allocatable :: A(:,:), B(:,:), C(:,:)
integer :: i
allocate(A(N,M), B(N,M), C(N,M))
do i = 1, N
A(i,:) = 1.0 * i
B(i,:) = 2.0 * i
end do
C = 0.0
!$acc data copyin(A, B) copyout(C)
!$acc parallel loop gang
do i = 1, N
call row_add(A(i,:), B(i,:), C(i,:), M)
end do
!$acc end parallel loop
!$acc end data
deallocate(A, B, C)
end program openacc_subroutine
subroutine row_add(x, y, z, m)
!$acc routine seq
implicit none
integer, intent(in) :: m
real, intent(in) :: x(m), y(m)
real, intent(out) :: z(m)
integer :: j
!$acc data present(x,y,z)
do j = 1, m
z(j) = x(j) + y(j)
end do
!$acc end data
end subroutine row_add
$ nvfortran -O2 -acc=gpu -Minfo=acc repro.F90 -o repro
openacc_subroutine:
15, Generating copyin(a(:,:)) [if not already present]
Generating copyout(c(:,:)) [if not already present]
Generating copyin(b(:,:)) [if not already present]
16, Generating NVIDIA GPU code
17, !$acc loop gang ! blockidx%x
18, !$acc loop vector(128) ! threadidx%x
18, Loop is parallelizable
row_add:
26, Generating acc routine seq
Generating NVIDIA GPU code
$ ./repro
Failing in Thread:1
Accelerator Fatal Error: call to cuStreamSynchronize returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
File: /path/to/bin/repro.F90
Function: openacc_subroutine:1
Line: 16
$ nvfortran --version
nvfortran 25.3-0 64-bit target on x86-64 Linux -tp sapphirerapids
Thanks in advance!