Hello @Fiona.Chen,
thanks for your reply. I read through the FAQ and checked if my nvstreammux is configured correctly. However, everything was fine with my configuration.
Regarding reproducibility, I have recreated my scenario inside the deepstream reference application. Thereby I observed the same phenomenon, however in a far less drastic way, where the batched version was only slower by a bit.
The steps to reproduce are:
- Re-encode the sample video to have no b-frames:
cp /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 .
ffmpeg -i sample_720p.mp4 -c:v libx264 -profile:v main -bf 0 -an sample_720p_new.mp4
- Stream the new video:
#!/bin/bash
# Start the rtsp-simple-server in the background, the script I use is:
./rtsp-simple-server rtsp-simple-server.yml &
# Give the server a few seconds to start up
sleep 5
ffmpeg -re -stream_loop -1 -i sample_720p_new.mp4 -r 30 -c copy -f rtsp rtsp://localhost:8554/teststream1 &
# Wait for all background processes to complete
wait
- I ran the deepstream reference application with once batching and once without (set
batch-size=1
inside the streammux) :
cd /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/deepstream-app
sudo NVDS_ENABLE_LATENCY_MEASUREMENT=1 NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 ./deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/batching_test.txt > ~/performance_with_batching_rtsp.txt
-> changed batching_test.txt conf
sudo NVDS_ENABLE_LATENCY_MEASUREMENT=1 NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 ./deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/batching_test.txt > ~/performance_without_batching_rtsp.txt
- Used a python script to collet the timings.
╰─➤ python performance_calculator.py performance_with_batching_rtsp.txt
Average time difference: 8.693900 ms
Median time difference: 8.223145 ms
Quantiles time difference: [6.7068359375, 6.72021484375, 6.7470703125, 7.22158203125, 8.22314453125, 9.25849609375, 10.255859375, 11.316259765625, 11.43271484375] ms
╰─➤ python performance_calculator.py performance_without_batching_rtsp.txt
Average time difference: 7.884207 ms
Median time difference: 7.291992 ms
Quantiles time difference: [6.01806640625, 6.028076171875, 6.0439453125, 6.27353515625, 7.2919921875, 8.215673828125, 9.295703125, 10.253515625, 10.760009765625] ms
This script heavily favors the batched version, since it calculates time as follows:
time of frame x = max(out_time_pgie_source_0, out_time_pgie_source_1) - min(in_time_pgie_source_0, in_time_pgie_source_1)
With this calculation, I have also got an overestimate of how long the non-batched version takes for both input sources per frame.
All the used files are the following:
performance_with_batching_rtsp.txt (3.2 MB)
performance_without_batching_rtsp.txt (3.2 MB)
performance_without_batching_rtsp_err.txt (5.2 KB)
performance_with_batching_rtsp_err.txt (5.2 KB)
performance_calculator_py.txt (2.6 KB)
batching_test.txt (4.3 KB)
I suspect that the error lies within my rtsp stream, since the normal sample stream works and with the no-b-frame stream I get the NVDEC_COMMON: NvDecGetSurfPinHandle : Surface not registered
error, as can be seen inside the performance_without_batching_rtsp_err.txt and performance_with_batching_rtsp_err.txt files. However, this is just a guess.
Do you have an idea on how I could fix the error, and what the reason would be that the non-batched version runs overall faster than the batched version?
Thank you in advance.