Why is there a period of idle time between kernels or transfers,what happened during this idle time?

I use nvprof to get some infomation like startTime, Duration, KernelName during training a model, the command:

nvprof --csv --log-file log.csv --print-gpu-trace python test1.py

the result is following:

and I find there is a idle time between some transfers, for example:

8.238898s + 0.264026s = 8.502924s < 8.534692s

and there is also a idle time between some kernels, for example:

8.602683s + 0.0217733s = 8.6244563s < 8.624457s

usually, the idle time between transfers is longer than the kernels, and I want to know what happened during this idle time? In addition, there is a Name called [CUDA memcpy DtoD], what is the process doing?

Any response would be greatly appreciated!

Hi, @lylyly6666

Sorry for the late response !
It will be good if you can move to using Nsight System instead of nvprof.
It will be easier to look at the timeline in Nsight System to identify why there are gaps between kernel launches for memory transfers.
These gaps could be due to various reasons such as:
a) overhead of CUDA APIs
b) some other processing in application code between the CUDA calls.
c) profiler overhead
d) some synchronization

This topic was automatically closed after 12 days. New replies are no longer allowed.