asyncEngineCount and peer-to-peer copies

venovako · March 3, 2012, 8:31pm

If I have two devices (A and B) in a Tesla S2050 box, with asyncEngineCount == 2, and these devices have peer access mutually enabled, can I perform simultaneous copies from A to B and from B to A?

Maybe I’m missing it in the documentation…

TIA, Vedran

tmurray · March 5, 2012, 8:28pm

the P2P interface supports that, although we intentionally make this kind of overlap difficult to make P2P transfers behave more sanely the rest of the time. basically, the rule for P2P is:

one of the copy engines will always be used for P2P transfers (I forget which), regardless of whether you are reading from or writing to a remote GPU
we will never ever use the other copy engine for a P2P transfer

so, you probably have to do

cudaSetDevice(0);
cudaMemcpyAsync(srcBuf0, dstBuf1, size, cudaMemcpyGeneric, stream0);
cudaSetDevice(1);
cudaMemcpyAsync(srcBuf1, dstBuf0, size, cudaMemcpyGeneric, stream1);

in order to get overlap

venovako · March 5, 2012, 8:57pm

Thanks tmurray, that solves my problem!
And if I understand correctly the copy engines part of your answer, two simultaneous transfers from a device - one to a peer and one to the host - should also be possible.

Vedran

Topic		Replies	Views
Can multiple cudaMemcpyAsync be executed in parallel? CUDA Programming and Performance cuda	5	606	August 4, 2023
Understanding cudaMemcpyPeerAsync CUDA Programming and Performance	1	3679	February 25, 2014
Erratic multi-gpu bandwidth CUDA Programming and Performance	8	2785	June 25, 2015
concurrency among copies: is it possible? CUDA Programming and Performance	5	2766	December 7, 2012
Queueing device-to-device/peer memcpy stalls concurrent copy operations CUDA Programming and Performance	6	397	June 11, 2024
GTX 680 and GTX 780 with only one copy engine? CUDA Programming and Performance	4	1342	March 3, 2014
cudaMemcpyAsync HtoD and DtoH blocking each other CUDA Programming and Performance	4	569	April 25, 2024
2-way memcpy? CUDA Programming and Performance	7	980	April 16, 2015
cudaMemcpyAsync Question Overlap HostToDevice and DeviceToHost trasfers CUDA Programming and Performance	2	5685	April 2, 2009
Concurrent Data Transfers CUDA Programming and Performance	9	7788	April 27, 2012

asyncEngineCount and peer-to-peer copies

Related topics