Delay with live inference

• Hardware Platform (Jetson / GPU) Jetson Orin Nano Dev kit
• DeepStream Version 7.0
• JetPack Version (valid for Jetson only) 6
• TensorRT Version 8.6.2
• Cuda Version 12.2

• Issue Type( questions, new requirements, bugs)
I’m facing a problem with trying to do live-time inference with cameras videos. I am trying to use peoplesegnet to inference people segmentation and also trying to track them with
The main issue is that when I run my pipeline with the inference model and the nvTracker I get a huge delay, about 4s delayed between the live frame and the displayed frame by the pipeline. I’m trying to reduce this delay and I noticed that when I turn off Inference and Tracking the delay drops to a non-significant delay, my doubt is that does all this latency between frame and reality comes from those 2 sections from my pipeline?

my pipeline sequence is:
videocorverter → captureFilters → nvStreammux → Inference - > nvTracker → Tiler → OSD → converterSink → capsFilter → Sink

tracker_config.txt (7.9 KB)
(for any purpose)

Please run below command before run your pipeline:

  1. Max power mode is enabled: $ sudo nvpmodel -m 0.
  2. The GPU clocks are stepped to maximum: $ sudo jetson_clocks

Please share the log of below command line:
$ sudo tegrastats

Heres the log of tegrastats while running the pipeline with inference and tracker
output_tegra.txt (33.8 KB)

The GPU utilization is 99%. You can run nsys to check if the GPU utilization is reasonable.

Yes, I know that. My question is about the delay between the frame and the real time frame, what is the correlation of gpu usage and the delay?

I am using right now a jetson orin nano and I can get 8fps. But if I run with a different gpu like a jetson agx orin I get 22fps but the delay stills over there

Seams GPU can’t process in real-time. It will cause delay. You need more powerful Jetson or optimize your model.

I’m using peoplesegnet with int 8 quantization
(PeopleSegNet | NVIDIA NGC) because from what I have seen it looks like the best model considering accuracy and performance(framerate). Is there a better optimization that I could do or another model to try to see if gpu is the main source of the bottleneck from the pipeline?

You can use nsys to check which module consumed the GPU.