Possibility to do d2d memcpy w/o CPU or w/o PCIe?

ONeill · May 19, 2010, 6:45am

Hi!

Id like to do some device to device memcopies without involving the CPU (I know that host to device transfers or vice versa arent possible without). As far as I know you need to do that by calling cudaMemcpy with specifying the cudaMemcpyDeviceToDevice parameter. But this function gets called by the CPU, what I would like to avoid, so is there a way to do a memcpy by device only? If not, is it planned to offer this in future CUDA releases? An official statement or point to such one would be very nice here.
Also I’m wondering if you can copy from device to device with some physical connection like e.g. a SLI bridge between your CUDA cards (know CUDA and SLI are different things) without using the PCIe.

Simon_Green · May 19, 2010, 8:01am

No, it is not possible to make CUDA API calls (cudaMemcpy or anything else) from the device. The CPU always has to be involved to some extent.

You can’t transfer arbitrary data over the SLI bridge (it’s just a digital video connection).

Why do you want to avoid the CPU and PCIe bus?

ONeill · May 19, 2010, 11:56am

We were just hoping for a little performance gain when doing 2d2 copies. Could it be possible to do the whole transaction in this case without the CPU in future releases?

avidday · May 19, 2010, 12:10pm

I think you have fundamentally misunderstood what the cudaMemcpyDeviceToDevice flag means. In the runtime API it literally means “copy where source and destination memory reside in the same GPU context (ie. in the memory of the same GPU)”. It is a convenient way for the CPU to move data around inside a GPUs memory without needing to run a kernel. In device code you have pointers and can move memory around anyway you like without needing any API calls at all. This is in contrast to the other memcpy options, literally “copy where only one of the source and destination memory reside in a GPU context, and the other in the host memory”.

The basic premises of CUDA contexts are that they are associated with exactly one GPU and one host thread at a time. None of this has anything to do with managing memory on multiple GPUs, which it seems is what you are implicitly asking about.

ONeill · May 19, 2010, 2:09pm

I think you have fundamentally misunderstood what the cudaMemcpyDeviceToDevice flag means. In the runtime API it literally means “copy where source and destination memory reside in the same GPU context (ie. in the memory of the same GPU)”. It is a convenient way for the CPU to move data around inside a GPUs memory without needing to run a kernel. In device code you have pointers and can move memory around anyway you like without needing any API calls at all. This is in contrast to the other memcpy options, literally “copy where only one of the source and destination memory reside in a GPU context, and the other in the host memory”.

The basic premises of CUDA contexts are that they are associated with exactly one GPU and one host thread at a time. None of this has anything to do with managing memory on multiple GPUs, which it seems is what you are implicitly asking about.

Thanks for your statement.

You are right, Im looking for a possibilty to transfer memory from one GPU to another without needing the CPU. In the app im going to write the memcpy makes up a significant part of the whole run time. Yet I havent looked into multi-GPU programming, cause one GPU is enough for that app. But as far as I know it seems like a bigger bottleneck if I will use a second one later on, cause then I need to memcpy from host to device once per GPU. Thus it would be nice to have the option to directly copy from one GPU to another one without needing the CPU to do any processing for.

Topic		Replies	Views
CudaMemcpyDeviceToDevice from one GPU to another CUDA Programming and Performance	2	8614	March 25, 2009
Inter-device copying CUDA Programming and Performance	2	897	May 25, 2010
Data copy between multi-GPUs CUDA Programming and Performance	2	1606	October 14, 2008
Memory copy between two CUDA contexts CUDA Programming and Performance	1	1663	March 16, 2009
asynchronous cuMemcpyDtoD ? CUDA Programming and Performance	9	2490	December 9, 2008
Continuously moving data from CPU mem to GPU mem? CUDA Programming and Performance	4	3281	October 26, 2007
cudaMemcpy() behavior question CUDA Programming and Performance	4	6696	August 8, 2007
Writing from GPU memory to memory on PCI device CUDA Programming and Performance	1	831	February 24, 2011
Copying from GPU0 to GPU1 is there a way to do it without a host? CUDA Programming and Performance	1	2239	February 15, 2010
Copy data from one GPU to another CUDA Programming and Performance	2	2212	July 1, 2010

Possibility to do d2d memcpy w/o CPU or w/o PCIe?

Related topics