cudaMemcpyDeviceToDevice

chrismc · September 15, 2008, 10:51am

Can cudaMemcpyDeviceToDevice be used to move data between devices?

Or do I have to copy from device 1 to host and then from host to device 2?

Tigga · September 15, 2008, 10:54am

I’m fairly certain DeviceToDevice can only copy within one device. It doesn’t work for copying from device 1 to device 2.

MisterAnderson42 · September 15, 2008, 12:15pm

Unfortunately, yes. NVIDIA has been promising fast device to device memory copies for a long time now…

E.D_Riedijk · September 15, 2008, 12:21pm

I believe that is not true. What I understand is that they currently pass by the host, and that NVIDIA is working on direct copies between two GPU devices, bypassing the host memory.

E.D_Riedijk · September 15, 2008, 12:42pm

Hmm, then I guess I remember wrong. I thought they were working, just not fast yet…

MisterAnderson42 · September 15, 2008, 3:25pm

NVIDIA provides nothing to automatically copy from one GPU to another, it is up to what the user does.

The user must implement it as a copy to GPU1-> host mem → GPU2. This is not as fast as it could be because pinnned memory is only pinned to a particular GPU, so one of those two copies must be slow. It would be great if pinned memory works for all GPUs, but it doesn’t.

Even more ideal would be if there was a way to copy from one GPU directly to the other (i.e. over SLI or over the PCIe link currently being advertised in the 790i chipset or by some other method using PCIe).

I have no idea what NVIDIA has in mind for this, only that there have been 2 or 3 forum posts stating that “fast gpu to gpu transfers are under consideration for a future version of CUDA” or some such.

Sidney_Lima · January 16, 2009, 11:17pm

Is there any news about copying from a GPU to another?

–
Sidney Lima
Recife Brasil
www.sidneylima.com
sidney@sidneylima.com

Tobbey · November 13, 2020, 1:59pm

Sorry to bring this back to life. Is there any guarantee that copy will occure through NVlink when using 2 (or more) gpu ? Where is the documentation related to this aspect ?

Robert_Crovella · November 13, 2020, 2:33pm

I don’t know what (or more) means.

It will take place over NVLink if there is a direct NVLink connection between the two GPUs in question, and you have properly enabled CUDA peer access between the 2 devices, and you use the cudaMemcpyPeer* family of functions.

See here and you can also just search for references to “peer” in the programming guide.

From that particular section:

Note that if peer-to-peer access is enabled between two devices via cudaDeviceEnablePeerAccess() as described in Peer-to-Peer Memory Access, peer-to-peer memory copy between these two devices no longer needs to be staged through the host and is therefore faster.

Topic		Replies	Views
CudaMemcpyDeviceToDevice from one GPU to another CUDA Programming and Performance	2	8280	March 25, 2009
how to share data between two GPU? CUDA Programming and Performance	3	1832	July 11, 2009
Inter-device copying CUDA Programming and Performance	2	852	May 25, 2010
Copying from GPU0 to GPU1 is there a way to do it without a host? CUDA Programming and Performance	1	2184	February 15, 2010
NVLINK CUDA Programming and Performance	3	1986	May 5, 2018
Small random memcpy (device to device) on GPU CUDA Programming and Performance	6	8233	August 21, 2015
How does “cudaMemcpyPeer” implement? CUDA Programming and Performance	3	1297	February 6, 2024
Device to device copy = SLI copy? SLI copy feature? when? CUDA Programming and Performance	2	3582	October 14, 2007
peer-to-peer copy using cuMemcpy rather than cuMemcpyPeer CUDA Programming and Performance	1	2102	August 9, 2011
Data copy between multi-GPUs CUDA Programming and Performance	2	1557	October 14, 2008

cudaMemcpyDeviceToDevice

Related topics