cuda fortran subroutines and modules

Is it necessary to encapsulate all cuda fortran subroutines within a module?

Can it be in the main program via a “contains” statement?


I ask this because I noticed a significant decrease in OpenMP performance with the Intel compiler when I use modules. I’m wondering if the Cuda Fortran will retain or increase performance if subroutines are in module; maybe and ease of loading to the GPU.

Hi srinath22,

CUDA Fortran “global” kernels need only have an interface which can be explicitly defined. Putting the kernels in a module makes it more convenient since interface are implicit

However, in order to call “device” routines, both the “global” and “device” must be in a module.

  • Mat