cudaMemcpyAsync same direction overlap

matomic · June 28, 2023, 4:40am

Hi experts,

I was under the impression that cudaMemcpyAync can only overlap if

host memory is page-locked
different direction (D2H vs H2D)
on different CUDA stream.

I was profiling a unit test and observe that Nsight System shows two D2H memcpy that overlap in time:

Is my impression incorrect?

ref: How to Overlap Data Transfers in CUDA C/C++ | NVIDIA Technical Blog

– M

Robert_Crovella · June 29, 2023, 3:32pm

At a high level, your interpretation and rubric is correct and the right one for CUDA developers to keep in mind (IMO). However, there is considerable complexity in the details as well as facts that aren’t fully reflected in your high-level rubric. For example, the tail end of one transfer can overlap with the head end of another transfer for certain cases using pageable memory as discussed here. That may or may not be applicable to your case. You haven’t provided a complete example, so I can’t give a definitive answer to what is happening precisely in your case (and I don’t know if I would, anyway), but with respect to your question about the rubric, yes, that is not a perfectly accurate, complete, bullet-proof formula. But it is a good guide, and other than curiosity about what appears to be an oddity, I consider it to be a useful and sufficient guide.

(Also, not applicable to your case, two transfers in the same direction can overlap if they are targetting different devices, in some cases, and depending on system topology)

Topic		Replies	Views
cudaMemcpyAsync Question Overlap HostToDevice and DeviceToHost trasfers CUDA Programming and Performance	2	5629	April 2, 2009
Does cudaMemcpyAsync require host memory to be pinned? CUDA Programming and Performance cuda	1	396	October 6, 2022
memory copy overlap CUDA Programming and Performance	7	14718	March 29, 2008
cudaMemcpyAsync H2D and D2H overlap CUDA Programming and Performance	2	5599	November 25, 2009
cudamemcpy2Dasync + stream create stream for 2D array CUDA Programming and Performance	5	3806	May 27, 2009
Bug when overlapping tranfert & data CUDA Programming and Performance	1	565	February 11, 2011
async memcpy only seems to overlap device->host CUDA Programming and Performance	0	948	August 17, 2009
Data transfers are not overlapping CUDA Programming and Performance	2	640	February 7, 2018
about streaming style sample code in Programming Guide ... why such a style? CUDA Programming and Performance	5	1420	January 23, 2009
Is it possible to coalesce cudaMemcpyAsync? CUDA Programming and Performance	7	18	December 10, 2024

cudaMemcpyAsync same direction overlap

Related topics