PGF90-F-0155-Compiler failed to translate accelerator region

Hello,

I am trying to associate a pointer in a Fortran kernel, but it looks like it does not work :

$ cat main_simple2.F90 
PROGRAM MAIN_SIMPLE2
IMPLICIT NONE
INTEGER(KIND=4) , DEVICE :: KLON_D, KSTA_D
REAL(KIND=8), DEVICE, POINTER :: X_D (:,:)

INTERFACE
ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 (KLON, KSTA, P)
INTEGER(KIND=4),INTENT(IN) :: KLON, KSTA
REAL (KIND=8), TARGET :: P (KLON, KSTA)
END SUBROUTINE GPU_SIMPLE2
END INTERFACE

KLON_D = 32
KSTA_D = 100

ALLOCATE (X_D (KLON_D, KSTA_D))

CALL GPU_SIMPLE2 <<<600, 32>>> (KLON_D, KSTA_D, X_D)

END PROGRAM MAIN_SIMPLE2

ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 (KLON, KSTA, P)

IMPLICIT NONE

INTEGER(KIND=4),INTENT(IN)    :: KLON, KSTA
REAL (KIND=8), TARGET :: P (KLON,KSTA)

REAL (KIND=8), POINTER :: X (:,:)
 
X => P (:, 1:10)

END SUBROUTINE GPU_SIMPLE2
$ pgf90 -Mcuda=ptxinfo,fastmath -c main_simple2.F90 
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected runtime function call (main_simple2.F90: 1)
PGF90/x86-64 Linux 17.7-0: compilation aborted

Is this a limitation of cuda Fortran ? If so, is there a list of these kind of problems ?

Regards,

Philippe

Hi Philippe,

Your code fails for me as well.
Some Fortran features which cannot appear in device code are listed in the CUDA Fortran programming guide, but this list is fluid, as we are continually trying to make more and more Fortran features available in device code to make it easier to port existing codes to CUDA Fortran (and OpenACC/Fortran).

F90 pointers fall into this category. A complicating factor (not completely technical) is that F90 pointers, even when working, will likely not perform very well in device code, and can result in race conditions, depending on how different threads access the pointer descriptor, and whether or not the pointer descriptor is shared amongst the threads. Making a descriptor for every thread adds lots of overhead/register usage.

I’ve begun to think the best guidance we can give now, for simple cases like you have below, is to use cray pointers. We have not recommended this in the past, but they are light-weight and do what people need the majority of the time (mainly, for pointing to a contiguous piece of memory)

So, rewriting your kernel like this is an example of cray pointers:

ATTRIBUTES(GLOBAL) &
SUBROUTINE GPU_SIMPLE2 (KLON, KSTA, P)

IMPLICIT NONE

INTEGER(KIND=4),INTENT(IN) :: KLON, KSTA
REAL (KIND=8), TARGET :: P (KLON,KSTA)
REAL (KIND=8) X(KLON, 10)
POINTER (XPTR, X)
XPTR = LOC(P(1,1))
X(threadIdx%x,1) = -99.0d0
END SUBROUTINE GPU_SIMPLE2

Thanks, I will see whether it works for me.