Is there any way to solely collect the total duration of the CUDA kernels within each nvtx range

wusiming · July 11, 2023, 11:38am

Hello! I would like to profile the forward, backward, and optimizer of my model. I have added some nvtx annotations in my code so that I can observe the forward, backward, and optimizer in the timeline of Nsight Systems. Though I am able to obtain the time consumed by the forward, backward, and optimizer on the CPU and each CUDA stream, I am unable to exclude the idle time of the GPU. Is there any way to solely collect the total duration of the CUDA kernels within each nvtx range, enabling me to determine the precise computation time of the forward, backward, and optimizer?

hwilper · July 11, 2023, 12:57pm

I think the best bet for doing that would be to export the data to sqlite and create a script to sum up only the time taken up by CUDA kernels in those NVTX ranges. You can probably start from our provided cuda_api_gpu_sum script and modify if needed. See User Guide :: Nsight Systems Documentation for details. (forums gives the top level name, but the link is to the correct section).

Also, just as a general note, if the gaps are large on the GPU side, I am not sure that you should not be trying to remove those gaps before you analyze individual phases. You might want to take a look at the memory transfers for example. See https://developer.nvidia.com/blog/optimizing-cuda-memory-transfers-with-nsight-systems/ for some examples.

Topic		Replies	Views
How can I get the exact CPU and GPU time in NSYS NVTX profiling? Profiling Linux Targets	4	839	June 6, 2024
The meaning of duration in an nvtx range Profiling Linux Targets	5	1021	December 29, 2022
How to analyze nsight system results? Profiling Embedded Targets	2	492	November 13, 2023
NVTX with GPU timing? Profiling Linux Targets	9	2227	October 6, 2023
Duration of an NVTX Range Other Tools	0	491	February 19, 2020
Anomalous time gaps observed when using CUDA kernels in PyTorch CUDA Programming and Performance	2	125	August 5, 2024
CUDA Pro Tip: Generate Custom Application Profile Timelines with NVTX Technical Blog	6	742	September 19, 2022
Nvtx Nsight Compute	5	915	August 29, 2023
Why is there a period of idle time between kernels or transfers，what happened during this idle time？ Visual Profiler and nvprof cuda	2	935	January 15, 2024
NVIDIA Tools Extension API (NVTX): Annotation Tool for Profiling Code in Python and C/C++ Technical Blog	1	665	October 17, 2022

Is there any way to solely collect the total duration of the CUDA kernels within each nvtx range

Related topics