I found in some previous discussion that currently CUDA does not support cudaMemcpy from host to device while GPU is busy in running kernel code (but will support in the future).
My question is: can we leverage the interoperability between CUDA and D3D/OpenGL to achieve similar functionality?
For example, map a D3D vertex buffer to CUDA device memory address, and then continuously move data into this vertex buffer for CUDA to process?
Short answer: it’s a hardware limitation. If it is a hardware limitation, D3D or openGL is not likely to be able to do it either. You can get much faster transfers with CUDA anyways, so why bother.
No problem. It is hard to find things with the forum search feature. You need to know just the right keywords. There have been rumors since that old post that it is not a hardware limitation and might show up in a future CUDA release, but no one from NVIDIA has confirmed such a rumor on this board.