Anomalous time gaps observed when using CUDA kernels in PyTorch

hexisyztem · August 5, 2024, 8:50am

The issue is that there is a certain time gap between the completion of one kernel’s execution and the launch of the next kernel. From the profiling perspective, I can’t determine what is happening during this time.

How can I analyze what is causing the time consumption during this period? Note: This part of the computation shown in the diagram is a simple encoder layer.

hexisyztem · August 5, 2024, 10:39am

I found that part of the gaps is caused by the host time consumed by torch.cuda.nvtx, but the remaining part still cannot find the specific cause of the time consumption. Is there any method to profile the host time?

Robert_Crovella · August 5, 2024, 3:13pm

You may get better help with pytorch questions by asking on a pytorch forum, such as discuss.pytorch.org. There are NVIDIA experts that patrol that forum. For profiler questions, you can ask on one of the profiler forums.

From a development perspective, I would find out (via source code inspection) what is happening around the time or prior to the kernel launch(es) in question, then start using nvtx myself to mark ranges of activity and see what shows up when I re-profile the code. You can use a hierarchical/binary-search type approach to divide and conquer, and zero in on a particular set of activity, fairly quickly.

Topic		Replies	Views
CUDA kernels of Pytorch model on AGX orin has huge time gaps Profiling Embedded Targets cuda , nsight , pytorch	9	1237	October 19, 2022
What the gaps on the nvvp pipeline mean? And how to shrink the gap size? CUDA Programming and Performance	6	747	September 15, 2019
Why is there a period of idle time between kernels or transfers，what happened during this idle time？ Visual Profiler and nvprof cuda	2	886	January 15, 2024
I get different time in ncu and pytorch prolifer Nsight Compute	3	1153	August 26, 2024
Gap between some thread calls CUDA Programming and Performance	6	1260	October 30, 2014
strange GPU idle time in profiler CUDA Programming and Performance	4	1001	June 27, 2011
Unexplained gap in profiling multi-GPU timeline CUDA Programming and Performance	1	582	October 27, 2020
Timing the kernel with cpu timer results in different timings CUDA Programming and Performance	4	1316	September 6, 2021
Profiling inside a kernel CUDA Programming and Performance	1	2261	May 8, 2009
idle time, gaps between kernels qunatifying syncronisation overhead CUDA Programming and Performance	1	768	September 14, 2011

Anomalous time gaps observed when using CUDA kernels in PyTorch

Related topics