Hello!
I have a loop that looks as follows:
!$acc parallel loop collapse(2) &
!$acc present(out, G, G%scale) &
!$acc private(du, ddu, err, deriv, uadj, f, df, live, k, itt, i, j)
do j = G%jsc, G%jec
do i = G%isc-1, G%iec
du = 0.0_dp; err = 1.0_dp; deriv = 1.0e6_dp
do itt = 1, max_itt
ddu = -err / deriv
du = du + ddu
if (abs(ddu) < 1.0e-15_dp * abs(du)) exit
err = 0.0_dp; deriv = 0.0_dp
do k = 1, nk
uadj = 0.01_dp + du
! PATTERN: live set in if/else, consumed after via G%scale
if (uadj > 0.0_dp) then
f = G%scale(i,j) * uadj * 10.0_dp
live = 10.0_dp ! set in if-branch
else
f = -G%scale(i,j) * uadj * 10.0_dp
live = -10.0_dp ! set in else-branch
end if
df = G%scale(i,j) * live ! live consumed after if/else
err = err + f
deriv = deriv + df
end do
end do
out(i,j) = du
end do
end do
This patter will fail with a
Accelerator Fatal Error: call to cuStreamSynchronize returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
However, this only happens when compiling the code with -O2 and higher. It happens both with acc parallel loop and omp target teams loop but not with omp target teams distribute parallel do. I have a small reproducible in the following repo. I have tested with multiple nvhpc versions 25.5 → 26.1 and they all present this error.