Launch kernel in advance?

nikita-aster · April 12, 2019, 10:10am

Hello all!

Sometimes I face the situation when I have to wait until some kernel finishes and thus I run DeviceSynchronize() or StreamSynchronize(). At this point host stucks and waits until the kernel finishes and unable to prepare for the next kernel launch. As the result, after Synch is finished, host spends additional 10 – 15 microseconds launching the next kernel and on fast GPUs this can create a really huge relative overhead.

I wonder, is there a way to somehow begin preparation of the next kernel launch in advance, in order to avoid this overhead? Many thanks!

Robert_Crovella · April 12, 2019, 1:52pm

Do the next kernel launch before the cudaDeviceSynchronize()

cuda graphs may also help to avoid launch latency

Topic		Replies	Views
A really huge relative overhead CUDA Programming and Performance	1	416	September 12, 2019
Kernel function calls in regards to cudaSynchronizeDevice(); CUDA Programming and Performance	2	719	May 25, 2017
Newbie: async kernel, so I can do stuff on the CPU meanwhile, yeah? CUDA Programming and Performance	2	436	January 13, 2019
Kernel won't start until cudaDeviceSynchronize() is called CUDA Programming and Performance	1	628	December 17, 2017
Async Kernel launch cpu seems not getting control after kernel launch CUDA Programming and Performance	7	3298	December 3, 2008
Kernel Timing and cudaThreadSynchronize() CUDA Programming and Performance	6	2105	July 30, 2010
Device blocking while evaluating kernel Intended operation? CUDA Programming and Performance	2	1722	September 29, 2011
Why is there 10uS between kernel launches? CUDA Programming and Performance	2	3887	August 6, 2010
device synchronization inside cuda kernels CUDA Programming and Performance	2	3489	October 1, 2016
Why are kernel launches followed by synchronize so expensive? CUDA Programming and Performance	2	1505	July 2, 2012

Launch kernel in advance?

Related topics