Profiling DS-6.2, CUDA HW inactivity

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 535
• Issue Type( questions, new requirements, bugs)
Profiling my deepstream pipeline showed that there are constant ~20-40ms intervals where GPU-0 does nothing. At that time, GstNvinfer UID-1 (which is my yolov8 primary detector) is processing a batch, but CUDA HW for GPU-0 shows no activity:


As you can see there are two big intervals with no activity, first one is around 40ms, second one around 20ms. I checked what other threads are doing during those intervals - during the 40ms period, it seems that GstNvInfer receives the buffers for processing, then Cuda kernel becomes active (I assume inference is done):

As for the next 20ms interval, nothing else is done, it seems, apart for some trivial tracker and muxer tasks:

Perhaps you have a clue on why this happens? I read at this article that if there are significant gaps in the CUDA HW section, that means the pipeline has bottlenecks. So is my model the bottleneck here? What could be done to solve this?

I am attaching the profiler output file as a google drive link, perhaps you could look into it (nsight systems 2024.2.1 version).

Huge thanks in advance!

Could you attach your config file?
You can also try to add some NVTX trace in your postprocess code by referring to our FAQ:Use NVTX to trace any CUDA kernal function wrote by yourself if necessary and see if this is causing the GPU IDLE issue.

I ran sources, PGIE and tracker on GPU-0, SGIE and ds-example on GPU-1. Here is my config file:
config.txt (7.3 KB)

I do have some custom code in the ds-example plugin, however, running the pipeline with ds-example off (profile file link) still showed frequent idle moments in GPU-0:

I will see what I can do about adding the NVTX trace in my PGIE postprocess code.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

OK. You can also get the graph and the latency by referring the FAQ.
Generate GStreamer Pipeline Graph
Enable Latency measurement for deepstream sample apps

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.