CUDA API has been called before the kernel function is started. Does idle time only represent the kernel function scheduling time?
What is the process of kernel function startup? Can this process be divided into CPU launch, GPU setup and schedule the kernels? If this is true, can the GPU setup process overlap with the GPU kernel execution process?