cudaMemcpyAsync Question Overlap HostToDevice and DeviceToHost trasfers

kpg · April 2, 2009, 6:07am

For different streams, can we overlap DeviceToHost (for stream1) and HostToDevice (for stream2) using ‘cudaMemcpyAsync’ transfers?

I ask because of the following understanding:
“PCIe 1.x is often quoted to support a data rate of 250 MB/s in each direction, per lane… This means a sixteen lane (x16) PCIe card would then be theoretically capable of 250 MB/s * 16 = 4 GB/s in each direction.”

I know that either of the above cudaMemcpyAsync transfers, individually, can be overlapped with the kernel execution on stream3 (say).
I tried modifying the simpleStreams sample code, but it serialized the DeviceToHost (for stream1) and HostToDevice (for stream2) transfers. I could be missing something.
Thank you for any insights.

kpg

theMarix · April 2, 2009, 9:20am

As I understand section 3.2.6 of the CUDA Programming Guide you can only overlap kernel execution and memory copies.

tmurray · April 2, 2009, 4:15pm

You can only overlap one memcpy and one kernel–this is a hardware limitation.

Topic		Replies	Views
Overlap Device2Host and Host2Device memcpy? How can we overlap two cudaMemcpy calls? CUDA Programming and Performance	4	4479	June 4, 2008
async memcpy only seems to overlap device->host CUDA Programming and Performance	0	949	August 17, 2009
memory copy overlap CUDA Programming and Performance	7	14724	March 29, 2008
cudaMemcpyAsync same direction overlap CUDA Programming and Performance	1	312	June 29, 2023
Can multiple cudaMemcpyAsync be executed in parallel? CUDA Programming and Performance cuda	5	449	August 4, 2023
CUDA Streams Overlap Data Transfers CUDA Programming and Performance	2	609	October 24, 2013
Parallelizing data transfer with kernel execution CUDA Programming and Performance	7	1393	January 13, 2014
Asynchronous data transfer CUDA Programming and Performance	8	7079	May 15, 2008
is it possible to overlap computation with a device-to-device memcopy? CUDA Programming and Performance	2	1042	January 6, 2010
cudaMemcpyAsync H2D and D2H overlap CUDA Programming and Performance	2	5603	November 25, 2009

cudaMemcpyAsync Question Overlap HostToDevice and DeviceToHost trasfers

Related topics