I have a large portion of memory allocated on the host like this:
checkCudaErrors(cuMemAllocHost(&pCPUBuffer, cpubuffersize));
the device to host copy works correctly like this:
checkCudaErrors(cuMemcpyDtoH_v2(pCPUBuffer, pDevRGBBuffer, *width**height*3));
BUT if I want to copy a small amount of memory from the device to the large buffer with an offset, even if the offset is zero everything stops responding and I have to hard reset:
checkCudaErrors(cuMemcpyDtoH_v2(static_cast<unsigned char*>(pCPUBuffer)+offset, pDevRGBBuffer, *width**height*3));
I have to cast the void* pCPUBuffer else I cannot apply the offset.
It does not matter how large pCPUBuffer is, if it does not match the size of the size of the data being transferred it stops responding.
Why does a larger host buffer cause the display driver to stop responding?
The device is a Titan V
The OS is Windows 7
The CUDA version is 10.2