Hi,
The following code does not work :
PROGRAM MAIN
IMPLICIT NONE
REAL, POINTER :: P (:,:,:)
INTERFACE
!$acc routine (SUB) vector
SUBROUTINE SUB (IN, JN, P)
REAL :: P (IN, JN)
INTEGER :: IN, JN
END SUBROUTINE SUB
END INTERFACE
INTEGER :: KN, K, IN, JN
KN = 1
IN = 128
JN = 87
ALLOCATE (P (IN, JN, KN))
!$acc data copy (P)
!$acc parallel loop gang
DO K = 1, KN
PRINT *, " ASSIGN IN PARALLEL LOOP "
P (1, JN, K) = 999.
PRINT *, LOC (P (1, JN, K))
PRINT *, " DONE "
CALL SUB (IN, JN, P (:, :, K))
ENDDO
!$acc end parallel loop
!$acc end data
END
!$acc routine (SUB) vector
SUBROUTINE SUB (IN, JN, P)
IMPLICIT NONE
REAL :: P (IN, JN)
INTEGER :: IN, JN
PRINT *, " ASSIGN IN VECTOR ROUTINE "
PRINT *, LOC (P (1, JN))
P (1, JN) = 999.
PRINT *, " DONE "
END SUBROUTINE SUB
I compile it with :
$ pgf90 -Minfo=accel,all,intensity,ccff -o main.gpu.x -acc=gpu main.F90
main:
26, Generating copy(p(:,:,:)) [if not already present]
28, Generating Tesla code
29, !$acc loop gang ! blockidx%x
29, Intensity = 20.00
34, Possible copy in and copy out of p in call to sub
sub:
44, Generating Tesla code
And then :
$ ./main.gpu.x
ASSIGN IN PARALLEL LOOP
22513743580160
DONE
ASSIGN IN VECTOR ROUTINE
22514285718528
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
If I replace the POINTER by an ALLOCATABLE or a “POINTER, CONTIGUOUS”, then it works, and the printed addresses are the same.
From the warning (also issued when compiling for CPU), I understand that a copy of P is created and passed to SUB. But here, the copy should not be issued. Please note that on CPU, the copy does not occur.
I am using 21.5, but the problem was not in 20.11
Regards,
Philippe