Too much time for kernel launch latency

njuffa · November 12, 2022, 9:00pm

In a recent thread we established that the launch overhead of null kernels (kernels that don’t do anything) appears to have been reduced to about 3 microseconds with recent hardware and software, which constitutes a new “speed of light”:

As a general principle, any time multiple software instances attempt to access a single physical resource, latency is likely to increase, as some form of communication has to occur to negotiate access between these instances.

With increasing GPU performance, it becomes more likely that kernel performance becomes negatively impacted by launch overhead. Programmers should therefore strive to pack a sufficient amount of work into each kernel launch. As a rule of thumb, one might want to target a minimum kernel runtime of around 1 millisecond for high-end GPUs. Obviously that is not always realizable.

Topic		Replies	Views
Single or multiple CPU threads using same GPU? CUDA Programming and Performance cuda , performance	5	2178	September 14, 2023
Kernel operation delays when gpu is idle Profiling Linux Targets cuda , kernel , python	10	427	March 20, 2024
Why is my single thread GPU speed 1000x faster than my CPU? CUDA Programming and Performance	14	4754	January 9, 2017
What are possible reasons of heavy kernel launch latency? CUDA Programming and Performance cuda , kernel , python	8	641	March 26, 2024
need a help from employees or guys who know compiler well CUDA Programming and Performance	22	8609	December 18, 2008
CUDA thread in background? CUDA Programming and Performance	10	15973	February 19, 2010
Some kernel launch is taking much longer (100x) than others in the same Cuda Stream CUDA Programming and Performance	7	367	February 10, 2024
Multiple kernels in flight? CUDA Programming and Performance	19	26819	August 28, 2007
Specifics on performance CUDA Programming and Performance	7	2790	November 11, 2008
CUDA 1.0 Asynchronous Launches CUDA Programming and Performance	10	9435	June 29, 2007

Too much time for kernel launch latency

Related topics