cuMemcpyDtoH causes device to stop responding with host buffer offset

I have a large portion of memory allocated on the host like this:

checkCudaErrors(cuMemAllocHost(&pCPUBuffer, cpubuffersize));

the device to host copy works correctly like this:

checkCudaErrors(cuMemcpyDtoH_v2(pCPUBuffer, pDevRGBBuffer, *width**height*3));

BUT if I want to copy a small amount of memory from the device to the large buffer with an offset, even if the offset is zero everything stops responding and I have to hard reset:

checkCudaErrors(cuMemcpyDtoH_v2(static_cast<unsigned char*>(pCPUBuffer)+offset, pDevRGBBuffer, *width**height*3));

I have to cast the void* pCPUBuffer else I cannot apply the offset.
It does not matter how large pCPUBuffer is, if it does not match the size of the size of the data being transferred it stops responding.

Why does a larger host buffer cause the display driver to stop responding?

The device is a Titan V
The OS is Windows 7
The CUDA version is 10.2

Anybody there?

I want to copy GPU memory to a section of a larger buffer the host but the driver hangs, my rig stops responding and I have to press the reset button, why is this happening?

Is there something wrong?

Can you please provide a minimal reproducible. You are either getting a CPU out of bounds access or a GPU out of bounds access that is causing some form of corruption. The offset = 0 statement does seem odd. The function definitely allows you to copy from any valid address to any valid address.

If the system memory is not pinned the driver will copy from non-pinned pCPUBuffer to a staging buffer and then issue a DMA. If the size is really small an alternative path may be used. Without a reproducible that defines all variables it is hard for anyone to provide additional help.

Turns out it was because I was executing it in the wrong thread.

1 Like