CUDA Fortran can not pass shared memory to device subroutine

Hello, Mat!

I use PVF 13.9 and Windows 7 64-bit OS to compile a program. The kernel of the program uses shared memory. It can compile. However, when pass the shared memory to the device functions, it fails to run.

I have send the source code to

The problem happens at file CudaKernel.cuf line 129. When set shared memory param_s(1)=x_d(iRow,1). The error message is

0: ALLOCATE: 60 bytes requested: status = 30(unknown error)

Please help me to solve this. Thanks in advance!


Hi Nightwish,

The default for CUDA Fortran is to generate relocatable device code (RDC) so that separate compilation and the ability to call device routines in external objects. However, this imposes a restriction that disallows passing shared memory as arguments.

You can pass shared memory when RDC is disabled, “-Mcuda=nordc”, however all of your device routines must be contained in the same module as the other device routines that call them. In this case, device routines are actually inlined and no call is actually made.

Hope this helps,

Hi Mat!

I have follow all your instructions. However, the same error when run the exe.

What can I do further?

Additionally, I wrote a simple example which pass shared array from global kernel to the device code. I compile it with the rdc set to default. This example can compile and run successfully. Therefore I think this error is not related to the rdc setting. There may be some other bugs in my program. I have send all the source code to the servcie E-Mail.

Thank you!