Hi Nightwish,
The default for CUDA Fortran is to generate relocatable device code (RDC) so that separate compilation and the ability to call device routines in external objects. However, this imposes a restriction that disallows passing shared memory as arguments.
You can pass shared memory when RDC is disabled, “-Mcuda=nordc”, however all of your device routines must be contained in the same module as the other device routines that call them. In this case, device routines are actually inlined and no call is actually made.
Hope this helps,
Mat