Hello CUDA community! When compiling my CUDA Fortran program, I placed all the memory allocation and data transfer operations in the main program, and only encapsulated the kernel functions within a module. The issue I encountered is that there are multiple data transfers from CPU to GPU between two kernel function calls. I suspect this might be where the error occurs, but I’m not sure about the exact cause. So, my question is: should I encapsulate memory allocation, data transfer, and kernel functions all within the module?
Again, it’s difficult to say without a reproducing example. Best guess is that it’s the Fortran descriptor being copied over as we’ve discussed in your other posts.