Problem with OpenACC & pointer slice

Hi,

The following code does not work :

PROGRAM MAIN

IMPLICIT NONE

REAL, POINTER :: P (:,:,:)

INTERFACE
!$acc routine (SUB) vector
SUBROUTINE SUB (IN, JN, P)

REAL :: P (IN, JN)
INTEGER :: IN, JN

END SUBROUTINE SUB

END INTERFACE

INTEGER :: KN, K, IN, JN

KN = 1
IN = 128
JN = 87

ALLOCATE (P (IN, JN, KN))

!$acc data copy (P)

  !$acc parallel loop gang 
  DO K = 1, KN
    PRINT *, " ASSIGN IN PARALLEL LOOP "
    P (1, JN, K) = 999.
PRINT *, LOC (P (1, JN, K))
    PRINT *, " DONE "
    CALL SUB (IN, JN, P (:, :, K))

  ENDDO
  !$acc end parallel loop

!$acc end data

END

!$acc routine (SUB) vector
SUBROUTINE SUB (IN, JN, P)

IMPLICIT NONE

REAL :: P (IN, JN)
INTEGER :: IN, JN

PRINT *, " ASSIGN IN VECTOR ROUTINE "
PRINT *, LOC (P (1, JN))
P (1, JN) = 999.
PRINT *, " DONE "

END SUBROUTINE SUB

I compile it with :

$ pgf90 -Minfo=accel,all,intensity,ccff -o main.gpu.x -acc=gpu main.F90
main:
     26, Generating copy(p(:,:,:)) [if not already present]
     28, Generating Tesla code
         29, !$acc loop gang ! blockidx%x
     29, Intensity = 20.00   
     34, Possible copy in and copy out of p in call to sub 
sub:
     44, Generating Tesla code

And then :

$ ./main.gpu.x
  ASSIGN IN PARALLEL LOOP 
           22513743580160
  DONE 
  ASSIGN IN VECTOR ROUTINE 
           22514285718528
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

If I replace the POINTER by an ALLOCATABLE or a “POINTER, CONTIGUOUS”, then it works, and the printed addresses are the same.

From the warning (also issued when compiling for CPU), I understand that a copy of P is created and passed to SUB. But here, the copy should not be issued. Please note that on CPU, the copy does not occur.

I am using 21.5, but the problem was not in 20.11

Regards,

Philippe