The following code does not work :
PROGRAM MAIN IMPLICIT NONE REAL, POINTER :: P (:,:,:) INTERFACE !$acc routine (SUB) vector SUBROUTINE SUB (IN, JN, P) REAL :: P (IN, JN) INTEGER :: IN, JN END SUBROUTINE SUB END INTERFACE INTEGER :: KN, K, IN, JN KN = 1 IN = 128 JN = 87 ALLOCATE (P (IN, JN, KN)) !$acc data copy (P) !$acc parallel loop gang DO K = 1, KN PRINT *, " ASSIGN IN PARALLEL LOOP " P (1, JN, K) = 999. PRINT *, LOC (P (1, JN, K)) PRINT *, " DONE " CALL SUB (IN, JN, P (:, :, K)) ENDDO !$acc end parallel loop !$acc end data END !$acc routine (SUB) vector SUBROUTINE SUB (IN, JN, P) IMPLICIT NONE REAL :: P (IN, JN) INTEGER :: IN, JN PRINT *, " ASSIGN IN VECTOR ROUTINE " PRINT *, LOC (P (1, JN)) P (1, JN) = 999. PRINT *, " DONE " END SUBROUTINE SUB
I compile it with :
$ pgf90 -Minfo=accel,all,intensity,ccff -o main.gpu.x -acc=gpu main.F90 main: 26, Generating copy(p(:,:,:)) [if not already present] 28, Generating Tesla code 29, !$acc loop gang ! blockidx%x 29, Intensity = 20.00 34, Possible copy in and copy out of p in call to sub sub: 44, Generating Tesla code
And then :
$ ./main.gpu.x ASSIGN IN PARALLEL LOOP 22513743580160 DONE ASSIGN IN VECTOR ROUTINE 22514285718528 Failing in Thread:1 call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
If I replace the POINTER by an ALLOCATABLE or a “POINTER, CONTIGUOUS”, then it works, and the printed addresses are the same.
From the warning (also issued when compiling for CPU), I understand that a copy of P is created and passed to SUB. But here, the copy should not be issued. Please note that on CPU, the copy does not occur.
I am using 21.5, but the problem was not in 20.11