We have set up a Deepstream pipeline that takes a h264 encoded RTSP-stream, decodes it and performs inference on it using a yolo detector.
The pipeline looks a bit like the following:
rtspsrc → rtph264depay → queue → nvv4l2decoder → queue → nvstreammux → nvinfer → nvstreamdemux → nvvideoconvert → nvdsosd → queue → fakesink
We perform inference at 15FPS and the camera stream is configured at 15FPS as well. We allow up to 6 camera’s to be connected at the same time, which we all pass to nvstreammux. The inference interval is set depending on the amount of camera’s: with two camera’s the inference interval is set to 1 so that each camera gets inference at 7.5FPS, to optimally utilize the 15FPS that is available.
In case the system gets slowed down, a queue after the nvv4l2decoder is used (second queue in the pipeline) that can store up to 20 frames. As soon as this limit is reached, frames are dropped to ensure that no more delay is built up.
For most camera’s that we use this system works fine, for Dahua camera’s however we notice that frames are never dropped in the second queue but instead over time pile up in the first queue, indicating that the nvv4l2decoder can not decode frames fast enough. This is especially the case when multiple Dahua camera’s are connected to the system and more frames need to be decoded.
A specific thing with regards to the Dahua camera is that frames seem to arrive in bursts in the rtspsrc instead of in a steady stream.
Normally speaking, we expect the nvinfer module to be the bottleneck, which means we can drop decoded frames if nvinfer is too slow. In this case however, we would have to drop encoded frames because the decoder is too slow, potentially introducing artefacts.
Is there a way to resolve this bottleneck, so we don’t have to drop encoded frames?
Things we have already tried:
set enable-max-performance on true
set disable-dpb on true
set enable-full-frame on true
different h264 encoding schemes (Main, Baseline, …)
modifying the i-frame interval to be smaller or bigger
using h265 instead of h264
None of these have resolved the issue of the frames piling up before the encoder.
We have also verified using jtop that the NVDEC hardware engine is actually working when the pipeline is running.
The frames from the rtspsrc do not actually arrive in bursts. The rtp packets do, but as soon as they are depayed it’s a steady stream.
We noticed that when we set the i-frame interval to a larger value like 75 (5 seconds at 15FPS), the decoder bottleneck seems to disappear. Our current i-frame setting is 15 (one i-frame a second) and we want to keep it this way.
An extra observation:
When we reduce the size of the second queue (the queue that drops frames after the decoder) to 2 instead of 20, we only get about 2-3FPS inference instead of the actual 15FPS. I assume this happens because with a small queue the nvinfer is waiting on the output of the decoder, whereas with a larger queue nvinfer can keep taking frames ensuring 15FPS inference.
Thanks, with this pipeline we have the following observations:
The pipeline you suggested runs at 15FPS without issues, there is no delay or lower FPS.
When I introduce a queue that drops frames: “queue max-size-buffers=2 max-size-time=0 max-size-bytes=0 leaky=2” after the decoder, the FPS drops to 1FPS. When the i-frame interval is set to 75 instead of 15, the FPS stays at 15 instead of dropping to 1FPS.
When I set the queue size to max 30 buffers instead of 2, we reach 15FPS again.
However, when the inferencing components are reintroduced (nvstreammux, nvinfer, nvstreamdemux, nvosd), a delay starts slowly accumulating in the rtspsrc and frames are never dropped in the dropqueue.
My questions are then:
How could the queue with max buffer size 2 drop so many decoded frames after being introduced after the decoder, reducing the framerate to 1 FPS? It does not do this for other camera streams and it does not do this for longer i-frame intervals.
Increasing the queue size to 30 seemingly makes the problem go away, but actually makes frames slowly pile up before the decoder (about 2 minutes of delay over 5 days times). If the decoder is not the bottleneck (we expect the inference to be the bottleneck), why are these frames not processed by the decoder and then dropped in the queue if the inference is too slow?
The problem does seem to be uniquely linked to the Dahua camera stream.
In the end, I circumvented the issue by introducing a probe after the decoder that checks if there are frames piling up in a queue before the decoder, and as soon as a certain amount of frames are reached frames are dropped in the probe (after the decoder).
This way only decoded frames are dropped and no delay is introduced.
I tested this solution and it seems to work fine.
Thanks for the help.