Profiling DS-6.2, CUDA HW inactivity

unsaltedbutter · April 7, 2024, 8:41am

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 535
• Issue Type( questions, new requirements, bugs)
Profiling my deepstream pipeline showed that there are constant ~20-40ms intervals where GPU-0 does nothing. At that time, GstNvinfer UID-1 (which is my yolov8 primary detector) is processing a batch, but CUDA HW for GPU-0 shows no activity:

As you can see there are two big intervals with no activity, first one is around 40ms, second one around 20ms. I checked what other threads are doing during those intervals - during the 40ms period, it seems that GstNvInfer receives the buffers for processing, then Cuda kernel becomes active (I assume inference is done):

As for the next 20ms interval, nothing else is done, it seems, apart for some trivial tracker and muxer tasks:

Perhaps you have a clue on why this happens? I read at this article that if there are significant gaps in the CUDA HW section, that means the pipeline has bottlenecks. So is my model the bottleneck here? What could be done to solve this?

I am attaching the profiler output file as a google drive link, perhaps you could look into it (nsight systems 2024.2.1 version).

Huge thanks in advance!

yuweiw · April 8, 2024, 1:57am

Could you attach your config file?
You can also try to add some NVTX trace in your postprocess code by referring to our FAQ:Use NVTX to trace any CUDA kernal function wrote by yourself if necessary and see if this is causing the GPU IDLE issue.

unsaltedbutter · April 8, 2024, 4:45am

I ran sources, PGIE and tracker on GPU-0, SGIE and ds-example on GPU-1. Here is my config file:
config.txt (7.3 KB)

I do have some custom code in the ds-example plugin, however, running the pipeline with ds-example off (profile file link) still showed frequent idle moments in GPU-0:

I will see what I can do about adding the NVTX trace in my PGIE postprocess code.

yuweiw · April 8, 2024, 6:55am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

OK. You can also get the graph and the latency by referring the FAQ.
Generate GStreamer Pipeline Graph
Enable Latency measurement for deepstream sample apps

system · April 23, 2024, 5:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Profiling a pipeline with gst-nvtracker DeepStream SDK deepstream	4	901	March 9, 2023
Nsight systems output for deepstream DeepStream SDK	4	606	November 7, 2023
strange GPU idle time in profiler CUDA Programming and Performance	4	1076	June 27, 2011
Unexplained gap in profiling multi-GPU timeline CUDA Programming and Performance	1	639	October 27, 2020
Deepstreamer Pipeline: Optimisation GPU Utilisation DeepStream SDK gstreamer , fps , deepstream	22	405	December 12, 2024
Reducing GPU Idle Time CUDA Programming and Performance	19	4703	June 14, 2022
CUDA HW idle time Jetson AGX Orin cuda	20	1415	January 25, 2023
Gaps in CUDA Trace Profiling Linux Targets	5	843	November 10, 2022
"idle time" between kernel calls ( from NVVP inspection) CUDA Programming and Performance	4	5275	December 10, 2012
Profiling Nsight system with deepstream-6.4 DeepStream SDK cudnn	13	668	May 21, 2024

Profiling DS-6.2, CUDA HW inactivity

Related topics