Why cudaMemcpyAsync needs to acquire lock?

user93209 · March 17, 2025, 11:47am

My cuda version is 12.4. I have two two threads on host and they work in two different cuda streams.
The first thread stuck at acquiring lock when calling cudaMemcpyAsync.
The second thread is doing an llm forward. Part of the call stack of two threads in gdb is shown below.

It appears to me that this two threads are independent to each other and should be able to run in parallel, because they are working with different memory addresses at this time. What is this lock used for?

striker159 · March 17, 2025, 12:07pm

Please do not post pictures of code, console, etc, but paste them directly.

The first thread appears to be in cudaLaunchKernel, not cudaMemcpyAsync.

The simple answer is given by the CUDA documentation:

Any CUDA API call may block or synchronize for various reasons such as contention for or unavailability of internal resources.

Topic		Replies	Views
cudaMemcpyAsync clarification required & help needed CUDA Programming and Performance	0	1758	October 17, 2009
Async questions Kernels appear to stall host threads CUDA Programming and Performance	3	2279	January 20, 2008
cudaMemcpy during kernel execution asynchronous kernel launch CUDA Programming and Performance	2	3103	July 20, 2007
question on asyncAPI.cu CUDA Programming and Performance	1	630	February 12, 2011
cudaMemcpyAsync code problem CUDA Programming and Performance	3	4581	September 16, 2008
cudaMemcpyAsync CUDA Programming and Performance	1	4868	December 8, 2008
Why some synchronize function make cudaMemcpyAsync and kernal in different stream work in sequential CUDA Programming and Performance	2	6567	March 1, 2011
asyncAPI sample question CUDA Programming and Performance	9	5075	December 18, 2007
cudaMemcpyAsync not giving any answers using cudaMemcpyAsync function CUDA Programming and Performance	1	814	September 5, 2011
Does Cuda memcpy locks device ? CUDA Programming and Performance	3	1587	June 16, 2011

Why cudaMemcpyAsync needs to acquire lock?

Related topics