Concurrent Kernel Launching to Hide Kernel Launching Overhead (Not only kernel execution))

gwk12291 · April 9, 2020, 8:38am

Recently I saw a StackOverflow post which shows huge kernel launching overhead when launching relatively small kernels. (https://stackoverflow.com/a/55898876), here’re the profiling results:

I wonder if kernel launch overhead in the CPU thread can be hidden when launching these kernels with different threads (using the same CUDA context of course).
Is such overhead only occurs in the CPU or some kind of special component in GPU hardware is also involoved.

Topic		Replies	Views
Quick question about kernel launch overhead and algorithm design... CUDA Programming and Performance	2	616	June 5, 2014
fundamental cuda kernel launch questions CUDA Programming and Performance	2	16492	July 31, 2008
Reusing GPU threads created by cuda kernel CUDA Programming and Performance	4	1173	February 18, 2019
Launch Overhead as a function of Kernel Size... Is it Proportional? Characterization? CUDA Programming and Performance	1	5339	June 24, 2008
Kernel design problem Performance difference in number of times a kernel is launched CUDA Programming and Performance	1	446	January 9, 2012
Hide memory overhead with the kernel launches Profiling Linux Targets	1	439	November 14, 2022
kernel launch time expensive? CUDA Programming and Performance	2	1597	July 28, 2009
Kernel enqueue overhead Bringing kernel overhead down? CUDA Programming and Performance	9	13742	March 12, 2010
Launch kernel in multi threads causes long launch cost CUDA Programming and Performance	2	22	March 7, 2025
Kernel operation delays when gpu is idle Profiling Linux Targets cuda , kernel , python	10	467	March 20, 2024

Concurrent Kernel Launching to Hide Kernel Launching Overhead (Not only kernel execution))

Related topics