CUDA blocks all threads when doing a Device to Host MemCpyAsync to a pageable host memory location

tanmaeygupta99 · October 28, 2023, 7:39am

CUDA docs say that async mem copies to a pageable host location will behave in a synchronous manner, which means that the host thread calling that memcpy will block. However, im seeing that all other host threads are also blocked and not able to complete any CUDA api call they were in the process of calling. For example, a host threads get stuck at cudaEventDestroy (checked by gdb), when another thread is in a cudaMemCpyAsync to a pageable location. Shouldn’t other threads be able to continue with their “interactions” with CUDA driver?

Robert_Crovella · October 28, 2023, 6:51pm

As indicated in comments on your cross posting, that expectation is incorrect.

Threads do not have independent, unfettered access to the CUDA API in all cases. From here:

Any CUDA API call may block or synchronize for various reasons such as contention for or unavailability of internal resources. Such behavior is subject to change and undocumented behavior should not be relied upon.

tanmaeygupta99 · October 28, 2023, 8:28pm

I believe the term “synchronous behaviour” is also not uniformly defined across CUDA APIs. When a thread calls cudaStreamSynchronize, other threads are able to continue with their respective cuda operations like cudaEventDestroy. However, the same is not observed when cudaMemCpyAsync behaves like a synchronous API (due to pageable copies).

Topic		Replies	Views
Synchronization of cudaMemcpyAsync for pageable memory CUDA Programming and Performance	2	1874	October 3, 2021
cudaMemcpyAsync waiting for another unrelated cudaMemcpyAsync CUDA Programming and Performance cuda	10	195	December 10, 2024
Async questions Kernels appear to stall host threads CUDA Programming and Performance	3	2329	January 20, 2008
CPU blocked MUCH longer than expected calling a cudaMemcpy after a cuda graph launch CUDA Programming and Performance	7	712	October 19, 2023
Confusion about synchronization or asynchronization of cudaMemcpy() and cudaMemcpyAsync() CUDA Programming and Performance	5	5113	December 23, 2023
Questions about when using cudaMemcpyAsync(), the host is blocked CUDA Programming and Performance	6	3718	April 5, 2018
cudaMemcpyAsync problem CUDA Programming and Performance	9	3329	May 26, 2020
cudaMemcpyAsync Device to Host : Need to synchronize before using data on host CUDA Programming and Performance	7	2507	October 7, 2022
Execution mode question: asynchronous or synchronous CUDA Programming and Performance	4	1444	January 26, 2011
Are cudaMemCpy and cudaMalloc blocking/synchronous? CUDA Programming and Performance	1	877	September 30, 2024

CUDA blocks all threads when doing a Device to Host MemCpyAsync to a pageable host memory location

Related topics