Performance of memcpyasync

511935201 · June 16, 2021, 7:24am

I got this information (For all other transfers, the function is fully asynchronous. If pageable memory must first be staged to pinned memory, this will be handled asynchronously with a worker thread.) from cuda document (https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior__memcpy-async) for the async copy. Then I do some tests with two streams, one for memcopyasyncH2D and another for kernel computing, and there is no dependency between two streams. It seems that memcpy is not async but sync. I don’t know why. Thanks.

Robert_Crovella · June 16, 2021, 1:50pm

cudaMemcpyAsync will be synchronous if the transfer is to or from pageable memory. See here:

Async memory copies will also be synchronous if they involve host memory that is not page-locked.

511935201 · June 17, 2021, 2:54am

Thanks. As the description in cuda programming guild, when the data size less than 64KB, MemcpyAsync is asynchronous for pageable memory. For other sizes it is synchronous.

Topic		Replies	Views
Synchronization of cudaMemcpyAsync for pageable memory CUDA Programming and Performance	2	1618	October 3, 2021
CPU blocked MUCH longer than expected calling a cudaMemcpy after a cuda graph launch CUDA Programming and Performance	7	518	October 19, 2023
Confusion about synchronization or asynchronization of cudaMemcpy() and cudaMemcpyAsync() CUDA Programming and Performance	5	2879	December 23, 2023
Accesing memory from both kernel and host side CUDA Programming and Performance	1	3029	February 17, 2008
How does the cudaMemcpyAsync work with not page-locked memory? CUDA Programming and Performance	4	430	August 28, 2023
cudaMemcpyAsync problem CUDA Programming and Performance	9	3027	May 26, 2020
Execution mode question: asynchronous or synchronous CUDA Programming and Performance	4	1369	January 26, 2011
cudaMemcpyAync with pageable memory overlap with kernal CUDA Programming and Performance cuda	3	709	January 23, 2023
cudaMemcpyAsync and pinned memory CUDA Programming and Performance	1	994	August 31, 2021
Memcpy_async() to host memory CUDA Programming and Performance	4	316	February 12, 2024

Performance of memcpyasync

Related topics