Has anybody experience with using page-locked memory that is not allocated with one of the CUDA-APIs on Windows OSs, e.g. with the VirtualAlloc()/VirtualLock() mechanism?
- Is it possible to copy data from host memory to device memory and vice versa?
- Is it as fast as using host memory that is allocated with cudaHostAlloc()?
One of the limitations using cudaHostAlloc() or VirtualAlloc() is that the memory is only accessible by the same process. Additionally, it may be swapped out by the OS if the process isn’t really active.
Has anybody experience with using page-locked host memory that is accessible beyond process boundaries, e.g. with the help of the Windows DDK (I think with something like IoAllocateIrp()).
Process A copies some data frome somewhere to the page-locked host memory via DMA.
Then the computing thread of process B copies the data from the page-locked host memory to some computing device via cudaMemcpy().