Gaps in CUDA Trace

puneethnaik · November 6, 2022, 11:49am

I am profiling a Pytorch inference application. When I run nvidia nsys to profile the application, here is the snapshot as seen in nvidia nsight systems UI

We can see from around +800ms to around +850ms we can see that there are no events/entries. Does this mean that the GPU was idle during this interval?

I call the following nsys command

nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu -o nsight_report -f true --cudabacktrace=kernel:10000 --osrt-threshold=10000 python3 <python script name>

Thanks

hwilper · November 6, 2022, 9:13pm

That is what that should mean. I can’t tell from the screen shot, but if I were looking at this, I would look at the CUDA APIs on the CPU to see if there was a force synchronization causing this. It could also be that the CPU is not supplying enough work. You could check the OSRT data and CPU backtrace to see what was going on in this time frame.

puneethnaik · November 10, 2022, 5:28am

Apologies for the delay. Thanks for the insight. Interestingly enough, there are “gaps” between two CUDA API calls in the CUDA API trace as well. The average “gap length” is 100 microseconds for a subset of the trace I observed. Do you have any suggestions as to the possible causes of this delay? I suspect the CPU is not sending work fast enough.
Thanks

hwilper · November 10, 2022, 7:17pm

You are probably correct, but you know your application better than I do.

puneethnaik · November 10, 2022, 7:19pm

Yeah it is difficult to say unless full context is known, which I have not given in the question. Apologies for that. But thanks a lot for the insights.

system · December 17, 2022, 10:31am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Discrepiances with memory profiling Jetson Xavier NX cuda	2	804	October 18, 2021
Question about Nsight System empty timeline gap Profiling Linux Targets	7	51	July 30, 2024
Unexplained gaps in CUDA stream execution Profiling x86 Windows Targets	7	1307	March 29, 2023
Error Collecting Nsys Profile Metrics Profiling Linux Targets nsight	3	553	April 18, 2024
nsys CUDA trace works for threads, but not for subprocesses Profiling Linux Targets	3	2311	May 13, 2019
Inconsistent results with nsight systems Profiling Embedded Targets	5	803	June 20, 2023
Sqlite does not contain CUDA kernel data CUDA on Windows Subsystem for Linux	12	3395	April 28, 2023
Kernel time of Nsight system is larger than nsight compute Profiling Linux Targets	11	818	April 3, 2024
Nsight System outputs "CUDA trace data was not collected." and there is no result for cuda kernels Profiling Linux Targets nsight	3	1690	September 25, 2023
Nsight System Profiling two CUDA python(i.e. Pytorch) processes using the same GPU simultaneously Profiling Linux Targets cuda , kernel , python	2	418	March 22, 2024

Gaps in CUDA Trace

Related topics