Latency Issue in DeepStream 6.3 when doing batched inference

dasr · July 26, 2024, 2:10pm

Hello,

I am currently working on a DeepStream 6.3 C++ project that processes two RTSP streams as input. My application is similar to the DeepStream reference application, where each stream is decoded and converted individually before being muxed together and propagated through various networks.

I have been experimenting with batch sizes and noticed the following:

When batching the two streams together, the latency for my first neural network is 10ms.
When not batching the streams, the latency per stream is 4.5ms. The total latency from the start of inference for the first stream to the finish for the slower stream is 7ms.

In both cases I used the same engine file which was build with a batch size of 2.

These results seem counterintuitive to me. Could you provide an explanation for this discrepancy, or is there something I might be doing wrong?

Thank you in advance for your assistance.

Best regards,
David

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Orin NX
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only) 5.1.2
• TensorRT Version 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 11.4.315
• Issue Type( questions, new requirements, bugs) questions/bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Fiona.Chen · July 29, 2024, 1:44am

Please refer to DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums to make sure the nvstreammux is configured corerctly.

Please provide the complete pipeline and conigurations you are using.

dasr · July 29, 2024, 5:42pm

Hello @Fiona.Chen,

thanks for your reply. I read through the FAQ and checked if my nvstreammux is configured correctly. However, everything was fine with my configuration.

Regarding reproducibility, I have recreated my scenario inside the deepstream reference application. Thereby I observed the same phenomenon, however in a far less drastic way, where the batched version was only slower by a bit.

The steps to reproduce are:

Re-encode the sample video to have no b-frames:

cp /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 .
ffmpeg -i sample_720p.mp4 -c:v libx264 -profile:v main -bf 0 -an sample_720p_new.mp4

Stream the new video:

#!/bin/bash

# Start the rtsp-simple-server in the background, the script I use is:
./rtsp-simple-server rtsp-simple-server.yml &

# Give the server a few seconds to start up
sleep 5

ffmpeg -re -stream_loop -1 -i sample_720p_new.mp4 -r 30 -c copy  -f rtsp rtsp://localhost:8554/teststream1 &

# Wait for all background processes to complete
wait

I ran the deepstream reference application with once batching and once without (set batch-size=1 inside the streammux) :

cd /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/deepstream-app
sudo NVDS_ENABLE_LATENCY_MEASUREMENT=1 NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 ./deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/batching_test.txt > ~/performance_with_batching_rtsp.txt
-> changed batching_test.txt conf
sudo NVDS_ENABLE_LATENCY_MEASUREMENT=1 NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 ./deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/batching_test.txt > ~/performance_without_batching_rtsp.txt

Used a python script to collet the timings.

╰─➤  python performance_calculator.py performance_with_batching_rtsp.txt 
Average time difference: 8.693900 ms
Median time difference: 8.223145 ms
Quantiles time difference: [6.7068359375, 6.72021484375, 6.7470703125, 7.22158203125, 8.22314453125, 9.25849609375, 10.255859375, 11.316259765625, 11.43271484375] ms
╰─➤  python performance_calculator.py performance_without_batching_rtsp.txt
Average time difference: 7.884207 ms
Median time difference: 7.291992 ms
Quantiles time difference: [6.01806640625, 6.028076171875, 6.0439453125, 6.27353515625, 7.2919921875, 8.215673828125, 9.295703125, 10.253515625, 10.760009765625] ms

This script heavily favors the batched version, since it calculates time as follows:
time of frame x = max(out_time_pgie_source_0, out_time_pgie_source_1) - min(in_time_pgie_source_0, in_time_pgie_source_1)
With this calculation, I have also got an overestimate of how long the non-batched version takes for both input sources per frame.

All the used files are the following:
performance_with_batching_rtsp.txt (3.2 MB)
performance_without_batching_rtsp.txt (3.2 MB)
performance_without_batching_rtsp_err.txt (5.2 KB)
performance_with_batching_rtsp_err.txt (5.2 KB)
performance_calculator_py.txt (2.6 KB)
batching_test.txt (4.3 KB)

I suspect that the error lies within my rtsp stream, since the normal sample stream works and with the no-b-frame stream I get the NVDEC_COMMON: NvDecGetSurfPinHandle : Surface not registered error, as can be seen inside the performance_without_batching_rtsp_err.txt and performance_with_batching_rtsp_err.txt files. However, this is just a guess.

Do you have an idea on how I could fix the error, and what the reason would be that the non-batched version runs overall faster than the batched version?

Thank you in advance.

Fiona.Chen · July 30, 2024, 2:00am

dasr:

cd /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/deepstream-app
sudo NVDS_ENABLE_LATENCY_MEASUREMENT=1 NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 ./deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/batching_test.txt > ~/performance_with_batching_rtsp.txt
-> changed batching_test.txt conf
sudo NVDS_ENABLE_LATENCY_MEASUREMENT=1 NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 ./deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/batching_test.txt > ~/performance_without_batching_rtsp.txt

What did you change for the “non-batching” version?

dasr · July 30, 2024, 7:03am

I just modified the batch-size parameter of the streammux from 2 to 1 inside the batching_test.txt config file.

Fiona.Chen · July 30, 2024, 7:25am

For your case, seems the model is fast enough, so the “batch-size=1” will be better for live stream since the nvstreammux will not try to wait for the frame.

dasr · July 30, 2024, 7:33am

Thank you for your reply. I was already aware of the waiting time and began timing only after the streammux was completed, specifically measuring from the start to the end of the nvinfer. Even with this adjustment, the non-batched version still outperformed, which shouldn’t be happening.

Interestingly, this issue only occurs when I remove the b-frames from my video; otherwise, it works as expected, where the batched version is faster.

dasr · August 8, 2024, 5:02pm

Hi @Fiona.Chen, just checking in—any updates on this?

Appreciate your help!

Fiona.Chen · August 9, 2024, 9:33am

Can you measure with the local mp4 file?

Fiona.Chen · August 9, 2024, 9:45am

The ffmpeg transcoding command does not generate any b frame.

yingliu · September 3, 2024, 1:59am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · September 17, 2024, 2:00am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When enable latency measurement, the result seems has some problem DeepStream SDK gstreamer	10	828	November 30, 2022
Regarding the frame delay measurement method DeepStream SDK python , deepstream	6	159	July 10, 2024
Latency with python test3 DeepStream SDK	12	1553	October 12, 2021
High latency in Deepstream pipeline DeepStream SDK nvbugs , deepstream	6	105	February 26, 2025
RTSP latency does not work with NVSTREAMMUX DeepStream SDK nvbugs	37	5204	January 23, 2022
Significant Memory leak when streaming from a clients RTSP source DeepStream SDK gstreamer , nvbugs	12	2590	August 9, 2022
Some problems about RTSP streams on DS5.0 DeepStream SDK nvbugs	18	2020	October 12, 2021
Nvinfer batch-size from video file input DeepStream SDK jetson-inference	14	788	March 19, 2024
Can't add new RTSP source dynamically using NEW NVSTREAMMUX and adaptative batching DeepStream SDK	12	737	October 7, 2023
Disconnected RTSP streams reduce the frame rate of other streams DeepStream SDK deepstream61	7	917	July 26, 2022

Latency Issue in DeepStream 6.3 when doing batched inference

Related topics