I am profiling a Pytorch inference application. When I run nvidia nsys to profile the application, here is the snapshot as seen in nvidia nsight systems UI
We can see from around +800ms to around +850ms we can see that there are no events/entries. Does this mean that the GPU was idle during this interval?
I call the following nsys command
nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu -o nsight_report -f true --cudabacktrace=kernel:10000 --osrt-threshold=10000 python3 <python script name>
That is what that should mean. I can’t tell from the screen shot, but if I were looking at this, I would look at the CUDA APIs on the CPU to see if there was a force synchronization causing this. It could also be that the CPU is not supplying enough work. You could check the OSRT data and CPU backtrace to see what was going on in this time frame.
Apologies for the delay. Thanks for the insight. Interestingly enough, there are “gaps” between two CUDA API calls in the CUDA API trace as well. The average “gap length” is 100 microseconds for a subset of the trace I observed. Do you have any suggestions as to the possible causes of this delay? I suspect the CPU is not sending work fast enough.
You are probably correct, but you know your application better than I do.
Yeah it is difficult to say unless full context is known, which I have not given in the question. Apologies for that. But thanks a lot for the insights.