I’m having an issue and I’m hoping someone can explain it to me since I am not an expert on the differences between page-locked verses pageable memory once it has been virtually mapped to user space.
Basically I’ve developed software that manipulates video using cuda. All memory moves host to device and device to host are async and tied to streams. I have no issue if I use host memory that is created by cuda. Everything is happy.
The issue I’m experiencing is when I attempt an async memory transfer with host memory that was not created by cuda.
Here’s an example. I manipulate a frame of video in cuda, and need to push it back to host memory so I can route it to a broadcast quality SDI card. The drivers for this card provide me with page-locked physical memory that is virtually mapped to user space. Since this memory is page-locked I’m making the assumption that Cuda should be able to DMA transfer directly to this memory. I’m wrong. CudaMemcpyAsync will not transfer to this memory, only CudaMemcpy will transfer to this memory.
Will CudaMemcpyAsync only transfer to host memory Cuda creates? If so, Why? Isn’t all page-locked memory the same? What am I missing? Can Cuda not translate the virtual to a physical address? Will Cuda Async only work with addresses in it virtual address range? Please help!
Thanks in advance.