What does the idle time between kernel functions in Nsight System mean?

CUDA API has been called before the kernel function is started. Does idle time only represent the kernel function scheduling time?

What is the process of kernel function startup? Can this process be divided into CPU launch, GPU setup and schedule the kernels? If this is true, can the GPU setup process overlap with the GPU kernel execution process?

I am sorry that there was no response to this earlier, your forum post was dropped in an orphaned category that the Nsys team was unaware of until this afternoon.

I wrote a blog on this very thing, please take a look at Understanding the Visualization of Overhead and Latency in NVIDIA Nsight Systems | NVIDIA Developer Blog