CPU Kernel launch


I’m trying to profile my application on Xavier using the Nsight Systems.
I understood the CUDA API calls at the upper part of the Timeline view (in the threads section) are the “CPU kernel launch”.
I’m getting a much longer CPU kernel launches in the Xavier comparing to my desktop work station, so my question is: what is happening during a CPU launch of a kernel? and which system parameters could affect the time duration?

Hi Yotam,

Can you provide some more information on how much the timing differences are and what type of operation you are trying to run on the CPU? Are you seeing it using a certain sample application to compare against?

Hi LukeNV,

I see the difference comparing custom and CUDA kernel lunches on the CUDA API view in the Nsight systems TimeLine. I’m using my own application because the sample applications does not report CUDA API or NVTX data when being profiled in the Nsight systems.

for example running on a RTX2080 Desktop computer:

running on pegasus:

Hi yotam.nachmias,

Do you notice the same kernel launch time on both the iGPU and dGPU? Is there any chance you can share the sample code you are using to reproduce this?