Can I page lock / pin any memory for cudaMemcpyAsync() ?

Hello everyone. I want to use cudaMemcpyAsync() to copy a LabView image to the GPU.

Someone suggested to use Window’s VirtualLock() to make the memory page locked (no guarantees)

cudaStream_t stream;

  if (cudaStreamCreate(&stream) != cudaSuccess)

	return -1;

if (!GetProcessWorkingSetSize(GetCurrentProcess(), &minSize, &maxSize))

	return -2;

  if (!SetProcessWorkingSetSize(GetCurrentProcess(), minSize + 50000000, maxSize + 50000000))

	return -3;

if (!VirtualLock(pIn, nInPitch * nHeight))

	return -4;

if (cudaMemcpy2DAsync(gpuImage.GetStart(), gpuImage.Pitch(), pIn, nInPitch, LineSize(nWidth, nImageType),

					nHeight, cudaMemcpyHostToDevice, stream) != cudaSuccess)

	return -5;

// implicitly wait for cudaMemcpy2DAsync() to complete, but is this guaranteed?

  if (cudaStreamDestroy(stream) != cudaSuccess)

	return -6;

  if (!VirtualUnlock(pIn, nInPitch * nHeight))

	return -7;

I tried using the above code and it crashes either in cudaMemcpy2DAsync() or cudaStreamDestroy(). Any ideas?

One risk I identified is that VirtualLock doesn’t prevent memory from being swapped when all threads are idle, which can happen. However, the machine has 12GiB memory, so I don’t think swapping is a problem.

This doesn’t work.

Why and do you mean the idea cannot be done at all or just VirtualLock() doesn’t work?

Pinning arbitrary memory so you can perform a DMA transfer to the GPU isn’t supported right now. Why? Lots of internal technical reasons!

You could create pinned memory like normal then CPU memcpy from your unpinned to the newly allocated pinned memory.
Inefficient? Yes. But maybe that’s not a bottleneck… it depends on your application.

Hi SPWorley - you are right. On any fairly new PC, it is beneficial to copy data into CUDA allocated pinned memory prior to the transfer. I wrote about this in my blog a couple of months ago:

http://visionexperts.blogspot.com/2010/05/…-transfers.html

Regards

Jason

If you actually benchmark this, I’m pretty sure you’ll see that there isn’t a benefit to manually staging the copy to a buffer yourself.