CUDA caused frequently execution latencies up to 30 ms. Is there a hidden CUDA thread running?

cmtrhnn · January 12, 2022, 10:54am

I am having a similar problem, I’ll add my observations, maybe they’ll help.

Page locked memory + synchronous memory copy waits for other kernel operations to be completed. So if you are using multiple threads they might be conflicting with each other. Robert said something about it:

When I switched to pinned memory + asynchronous memory copy I saw that the Host to Device tranfer delays are gone but now the overhead is transferred to the StreamSynchronization or DeviceSynchronization. I’m now trying to find an answer as to why synchronization is taking too long.

Topic		Replies	Views
Multiple CPU threads Performance hit CUDA Programming and Performance	5	5481	February 28, 2008
Multiple threads calling CUDA API in parallel CUDA Programming and Performance cuda , driver , parallel-computing	4	758	August 9, 2024
influence of muti-threading in cudaMemCpy? Jetson TX2	6	809	October 26, 2018
Does Cuda memcpy locks device ? CUDA Programming and Performance	3	1647	June 16, 2011
the same thing, different time consuming asking for help CUDA Programming and Performance	5	6323	May 26, 2009
CUDA introduces heavy locks? CUDA Programming and Performance	3	1644	May 17, 2018
Getting diff time statistics for same function Totally confused after seeing results CUDA Programming and Performance	3	4268	December 4, 2007
Performances of multi-thread vs multi-process with MPS CUDA Programming and Performance	2	3200	August 20, 2018
Implicit synchronization in host API call: cudalaunch and memcpyAsync ? CUDA Programming and Performance	4	1609	April 17, 2013
Cross-thread pageable D2H copy appears to delay `cudaLaunchKernel` in another thread CUDA Programming and Performance api	1	43	March 30, 2026

CUDA caused frequently execution latencies up to 30 ms. Is there a hidden CUDA thread running?

Related topics