Here’s a simple question:
When you invoke kernels you have to be in the right host thread. But what about when doing memory copy operations, like cudaMemcpy2D and cudaMemcpy2DToArray? Is it necessary to make sure these get called in the same thread as the one that created the CUDA resources?
This is also true for pinned (page-locked) host memory allocated with cudaMallocHost(). You have to call cudaMallocHost() and the cudaMemCpy*() functions from the same thread.
or use context magic to get the context to your current thread.
Good point, though that would require him to use the Driver API exclusively.
I’m pretty much in the same situation and I’m stuck with a codebase written for the Runtime API. What I do is
to forward memory allocation requests and frees to the GPU thread via a message queue.