I’ll try to keep the situation as simple as possible, two fortran subroutines, one is called from within a kernel, using PGI 20.4

SUBROUTINE SUB1

!$acc routine(sub2) vector

!$acc update device(…)

!$acc parallel loop independent

DO I = 1,10

CALL SUB2

END DO

END SUBROUTINE SUB1

SUBROUTINE SUB2

INTEGER :: I, N

REAL, DIMENSION(1000) :: AA, BB, CC

REAL :: MAG

!$acc routine(sub2) vector

N = 50 ! Some value computed earlier in routine, set to 50 here

!$acc loop seq

DO I = 1,N

AA(I) = …

BB(I) = …

CC(I) = …

END DO

!$acc loop seq

DO I = 1,N

MAG = SQRT(AA(I)*AA(I) + BB(I)*BB(I) + CC(I)*CC(I))

END DO

END SUBROUTINE SUB2

If I comment out the line that sets MAG, then I get zero cuda-memcheck errors. If I uncomment the line that sets MAG, then cuda-memcheck gives me this:

========= CUDA-MEMCHECK

========= Invalid **global** write of size 8

========= at 0x000029b8 in sub2_

========= by thread (0,0,0) in block (51,0,0)

========= Address 0x00000000 is out of bounds

========= Device Frame:sub1_709_gpu (sub1_709_gpu : 0x458)

A bounds error doesn’t make sense to me since the second loop has exactly the same bounds as the previous loop.

Any help appreciated.