How to define destination device stream in cudaMemcpyPeerAsync()?

cuda27 · September 22, 2013, 6:21am

I am doing a asynchronous memcpy from gpu0 to gpu1 using cudaMemcpyPeerAsync().

cudaMemcpyAsync() provides option for stream to use for gpu0, but not for gpu1. Can I somehow define the stream of the receiving device too?

I am using OpenMP threads to manage each of the devices (so, they are in separate context).

Visual Profiler shows the stream for sending device but for receiving device, this memcpy is just shown in the MemCpy (PtoP) and not in any of the streams (not even in the default stream)

PS: My current implementation works fine. I just want to overlap the sending and receiving communication.

Topic		Replies	Views
Understanding cudaMemcpyPeerAsync CUDA Programming and Performance	1	3632	February 25, 2014
How to use streams for asynch transfers CUDA Programming and Performance	3	900	February 18, 2011
asynchronous cuMemcpyDtoD ? CUDA Programming and Performance	9	2449	December 9, 2008
Questions on Streams CUDA Programming and Performance	5	2161	July 16, 2008
async memcpy only seems to overlap device->host CUDA Programming and Performance	0	960	August 17, 2009
cudaMemcpyAsync CUDA Programming and Performance	1	4868	December 8, 2008
cudaMemcpyAsync code problem CUDA Programming and Performance	3	4581	September 16, 2008
Questions about "cudaMemcpyAsync" Legacy PGI Compilers	1	2378	November 18, 2011
multi-gpu and cudamemcpyasync CUDA Programming and Performance	12	10914	April 15, 2010
cuMemcpyDtoA with stream CUDA Programming and Performance	0	1062	December 15, 2008

How to define destination device stream in cudaMemcpyPeerAsync()?

Related topics