how can I exclude the device initialization time in my CUDA Fortran program? In CUDA C, I can create a dummy context (e.g. allocate one integer on the device) before the time measurement starts.
The problem is my program structure: I have a C-program P which calls a certain C-function X and P does also the time measurement around this C-function X. The C-function X is called serveral times and all runtimes are aggregated. I cannot modify this part of the program.
The C-function X calls (more or less) a Fortran function Y. The Fortran function Y uses CUDA Fortran (one CUF-file comprising function Y and the kernel). It does the memory allocation and the kernel execution.
In C I could implement a global variable (in the cu-file) of an object that just allocates some memory in its constructor and thus is called only once before time measurement. But how to do in fortran? I tried this in my CUF-file:
module my_globals integer, device:: cudaHolder = 1 end module
But it didn’t work. Any other suggestions? I am not a native Fortran programmer, so please forgive me if there is a really simple fortran solution.