High latency in Deepstream pipeline

• Hardware Platform (Jetson / GPU) : Jetson
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) 6.1
• TensorRT Version 10.3.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

The Deepstream pipeline is running on a NVIDIA Jetson Orin NX (16GB) device inside the nvcr.io/nvidia/deepstream-l4t:7.1-triton-multiarch Docker container.

The inputs are 3 RTSP sources each at 15 FPS and 1280x720 resolution.
The outputs are MQTT Message broker and a RTSP stream.

I’m attaching the pipeline details here

• Bug Description

I’m seeing a delay of 5-6 seconds on the RTSP output stream.

In order to be more methodical and data-driven about this issue, I followed this thread to log the latency for every component in the Deepstream pipeline. The output was as follows:

lvt-deepstream  | 2025-02-05T19:42:21.823109+0000 : INFO : Stream #=stream1, Frame #=2915, ts=1738784541456861000, pts=295.607393268, Person #=0, Vehicle #=0
lvt-deepstream  | 2025-02-05T19:42:21.835183+0000 : INFO : Stream #=stream2, Frame #=2945, ts=1738784541456997000, pts=295.693550023, Person #=0, Vehicle #=27
lvt-deepstream  | 2025-02-05T19:42:21.836813+0000 : INFO : Stream #=stream0, Frame #=2909, ts=1738784541457146000, pts=295.632545898, Person #=0, Vehicle #=0
lvt-deepstream  | Comp name = nvv4l2decoder1 in_system_timestamp = 1738784541169.537109 out_system_timestamp = 1738784541170.355957               component latency= 0.818848
lvt-deepstream  | Comp name = nvstreammux-Stream-muxer source_id = 1 pad_index = 1 frame_num = 2914               in_system_timestamp = 1738784541170.430908 out_system_timestamp = 1738784541368.047119               component_latency = 197.616211
lvt-deepstream  | Comp name = nvv4l2decoder0 in_system_timestamp = 1738784541255.647949 out_system_timestamp = 1738784541256.508057               component latency= 0.860107
lvt-deepstream  | Comp name = nvstreammux-Stream-muxer source_id = 2 pad_index = 2 frame_num = 2944               in_system_timestamp = 1738784541256.635010 out_system_timestamp = 1738784541368.047119               component_latency = 111.412109
lvt-deepstream  | Comp name = nvv4l2decoder2 in_system_timestamp = 1738784541194.051025 out_system_timestamp = 1738784541194.952881               component latency= 0.901855
lvt-deepstream  | Comp name = nvstreammux-Stream-muxer source_id = 0 pad_index = 0 frame_num = 2908               in_system_timestamp = 1738784541195.074951 out_system_timestamp = 1738784541368.048096               component_latency = 172.973145
lvt-deepstream  | Comp name = convertor-preinfer in_system_timestamp = 1738784541372.096924 out_system_timestamp = 1738784541381.657959               component latency= 9.561035
lvt-deepstream  | Comp name = object-detector in_system_timestamp = 1738784541381.789062 out_system_timestamp = 1738784541546.319092               component latency= 164.530029
lvt-deepstream  | Comp name = tracker in_system_timestamp = 1738784541546.342041 out_system_timestamp = 1738784541640.041992               component latency= 93.699951
lvt-deepstream  | Comp name = secondary-par in_system_timestamp = 1738784541640.093994 out_system_timestamp = 1738784541640.187988               component latency= 0.093994
lvt-deepstream  | Comp name = secondary-var in_system_timestamp = 1738784541640.813965 out_system_timestamp = 1738784541640.960938               component latency= 0.146973
lvt-deepstream  | Comp name = secondary-par4ppe in_system_timestamp = 1738784541643.323975 out_system_timestamp = 1738784541727.000977               component latency= 83.677002
lvt-deepstream  | Comp name = nvtiler in_system_timestamp = 1738784541742.602051 out_system_timestamp = 1738784541755.750977               component latency= 13.148926
lvt-deepstream  | Comp name = onscreendisplay in_system_timestamp = 1738784541756.118896 out_system_timestamp = 1738784541820.539062               component latency= 64.420166
lvt-deepstream  | Comp name = convertor2 in_system_timestamp = 1738784541821.517090 out_system_timestamp = 1738784541841.538086               component latency= 20.020996
lvt-deepstream  | 2025-02-05T19:42:21.841830+0000 : INFO : Source id = 1 Frame_num = 2914 Frame latency = 672.118896484375 (ms) 
lvt-deepstream  | 2025-02-05T19:42:21.842087+0000 : INFO : Source id = 2 Frame_num = 2944 Frame latency = 586.008056640625 (ms) 
lvt-deepstream  | 2025-02-05T19:42:21.842286+0000 : INFO : Source id = 0 Frame_num = 2908 Frame latency = 647.60498046875 (ms) 
lvt-deepstream  | Encode Latency = 37.943848 
lvt-deepstream  | [mosq_mqtt_log_callback] Client 123 sending PUBLISH (d0, q0, r0, m2848, '/tracks', ... (13551 bytes))
lvt-deepstream  | Publish callback with reason code: Success.

The processing time per batch is close to 650ms which seems too high. I need your support to identify the inefficiencies in my pipeline and drive the processing time(and latency) as low as possible

1 Like

From the log you attached, you can try to set the batched-pull-timeout parameter smaller, like 40000.

@yuweiw Thanks for that suggestion. I changed batched-push-timeout from 7000000 to 40000 but I’m seeing a lot of dropped batches.
I adjusted it to 70000 and seems to be doing better in terms of dropped batches. Although, I’m seeing the same amount of latency on my output RTSP stream.
Do you have any other ways to optimize the pipeline as a whole?
Thanks!

Because there are so many plugins in your whole pipeline, you need to first find out which plugins are causing the latency, and then optimize for each one.
You can try the ways below to to confirm which plugin causes the latency.

  1. replace the rtspsink to nv3dsink
  2. reduce the nvinfer one by one
  3. remove tracker from the pipeline
  4. use only one video source

Hey @yuweiw, thanks for the tips. However, these suggestions are not working out for me.

  1. I couldn’t find the components that you mentioned. It would be great if you can point me to examples of these components. Note that, I need the output to be an RTSP stream.
    1. I’m not using any plugin called rtspsink in my pipeline. I’m using udpsink for RTSP output as described in this example by Deepstream.
    2. I did not find any plugin called nvds3dsink in Deepstream. I searched for it on my base Jetson device and even the Deepstream Docker container running on Jetson with gst-inspect-1.0 nvds3dsink.
    3. If you meant nv3dsink; it does not publish an RTSP stream as per the deepstream-test3 example here
  2. I’ve already done this exercise but it didn’t help. I need all the nvinfer components in my final pipeline. I’ve even posted the latency of individual nvinfer components in my first post.
  3. I’ve already done this exercise but it didn’t help. I need the tracker component in my final pipeline. I’ve even posted the latency of individual tracker component in my first post.
  4. Reducing number of streams solves the latency issue but all 3 streams are of importance in my final pipeline.
1 Like

Yes, it’s nv3dsink. It is recommended that you use this for comparison in order to exclude the effects of your rtsp server.

If this works, you can try optimizing your nvstreammux configuration to improve the efficiency of nvstreammux in live mode. You can refer to our FAQ.

Please consider running a latency analysis to determine what plugins are the culprits in your pipeline.
As mentioned, RTSP out can be one.

However, the pipeline shared has multiple models running

You can find details on how to do it here