• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 535
• Issue Type( questions, new requirements, bugs)
Profiling my deepstream pipeline showed that there are constant ~20-40ms intervals where GPU-0 does nothing. At that time, GstNvinfer UID-1 (which is my yolov8 primary detector) is processing a batch, but CUDA HW for GPU-0 shows no activity:
As you can see there are two big intervals with no activity, first one is around 40ms, second one around 20ms. I checked what other threads are doing during those intervals - during the 40ms period, it seems that GstNvInfer receives the buffers for processing, then Cuda kernel becomes active (I assume inference is done):
As for the next 20ms interval, nothing else is done, it seems, apart for some trivial tracker and muxer tasks:
![]()
![]()
Perhaps you have a clue on why this happens? I read at this article that if there are significant gaps in the CUDA HW section, that means the pipeline has bottlenecks. So is my model the bottleneck here? What could be done to solve this?
I am attaching the profiler output file as a google drive link, perhaps you could look into it (nsight systems 2024.2.1 version).
Huge thanks in advance!


