dynamically allocate an array of structure in cuda fortran?

hi,everyone.
I am using cuda Fortran and the compiler is pgi11.8.
I want to allocate memory on device for an array of structure(say STRUCT1) ,whose member is another structure (say STRUCT2). The following is the skeleton of our code:
TYPE STRUCT2
INTEGER,ALLOCATABLE::A(:)
END TYPE

TYPE STRUCT1
TYPE(STRUCT2),ALLOCATABLE::B(:)
END TYPE

TYPE(STRUCT1),ALLOCATABLE,DEVICE::C(:)

INTEGER::NA=5,NB=5,NC=5
INTEGER::I,J

ALLOCATE(C(NC))
DO I=1,NC
ALLOCATE(C(I)%B(NB))
DO J=1,NB
ALLOCATE(C(I)%B(J)%A(NA))
END DO
END DO

when compiling it reports following warning:

/tmp/pgcudafori18dyVySZDk7.gpu(58): Warning: Cannot tell what pointer points to, assuming global memory space

when running it reports “Segmentation fault” runtime error.

The problem is cuda Fortran cannot allocate a structure array of unknow size. But this is the usual case in real applications such as CFD, the problem size (the dimension of grid block) is often determined at runtime.

Can i figure it out by using runtime API or some other alternative methods are available???
Any suggestion will be appreciate. Thanks in advance.

Wow!

The problem is that after
ALLOCATE(C(NC))
the data C(1), C(2), … are on the GPU memory. The next line
ALLOCATE(C(I)%B(NB))
is executed on the host, but the C(I) lives on the GPU. We never even considered supporting allocation of dynamic data where the data pointer lives on the GPU. I’m somewhat surprised that it got as far as it did.

There’s no reasonable method to support this with the current compiler that I can think of. Essentially the compiler would have to generate the code to do the allocate from the host, then copy the pointer and array descriptor to the GPU memory. Let me put that in as a feature request.