Nvv4l2decoder Deepstream bottleneck with Dahua IPC-HDW3441T-ZAS camera

We have set up a Deepstream pipeline that takes a h264 encoded RTSP-stream, decodes it and performs inference on it using a yolo detector.
The pipeline looks a bit like the following:
rtspsrc → rtph264depay → queue → nvv4l2decoder → queue → nvstreammux → nvinfer → nvstreamdemux → nvvideoconvert → nvdsosd → queue → fakesink

We perform inference at 15FPS and the camera stream is configured at 15FPS as well. We allow up to 6 camera’s to be connected at the same time, which we all pass to nvstreammux. The inference interval is set depending on the amount of camera’s: with two camera’s the inference interval is set to 1 so that each camera gets inference at 7.5FPS, to optimally utilize the 15FPS that is available.
In case the system gets slowed down, a queue after the nvv4l2decoder is used (second queue in the pipeline) that can store up to 20 frames. As soon as this limit is reached, frames are dropped to ensure that no more delay is built up.

For most camera’s that we use this system works fine, for Dahua camera’s however we notice that frames are never dropped in the second queue but instead over time pile up in the first queue, indicating that the nvv4l2decoder can not decode frames fast enough. This is especially the case when multiple Dahua camera’s are connected to the system and more frames need to be decoded.
A specific thing with regards to the Dahua camera is that frames seem to arrive in bursts in the rtspsrc instead of in a steady stream.

Normally speaking, we expect the nvinfer module to be the bottleneck, which means we can drop decoded frames if nvinfer is too slow. In this case however, we would have to drop encoded frames because the decoder is too slow, potentially introducing artefacts.
Is there a way to resolve this bottleneck, so we don’t have to drop encoded frames?

Things we have already tried:

  • set enable-max-performance on true
  • set disable-dpb on true
  • set enable-full-frame on true
  • different h264 encoding schemes (Main, Baseline, …)
  • modifying the i-frame interval to be smaller or bigger
  • using h265 instead of h264

None of these have resolved the issue of the frames piling up before the encoder.
We have also verified using jtop that the NVDEC hardware engine is actually working when the pipeline is running.

Thank you very much for your time.

Platform info:
**• Jetson Nano **
**• Deepstream 5.0.1 **
**• Jetpack 4.4 **
**• TensorRT Version 7.1.3 **
• Issue Type: question/bug

Have you checked the streams from Dahua camera? Can you provide the caps of the stream with “export GST_DEBUG=rtspsrc:5”? Is the FPS in caps correct?

We have checked the streams from the Dahua camera and they run at the correct framerate and do not build up a delay in the camera itself.

The caps for the stream is the following:
stream 0x7f2001b380, pt 96, caps application/x-rtp, media=(string)video, payload=(int)96, clock-rate=(int)90000, encoding-name=(string)H264, packetization-mode=(string)1, profile-level-id=(string)4D6033, sprop-parameter-sets=(string)"J01gM4mNUCgC3QgAAAMACAAAAwDwIAA\=\,KO48gAA\=", a-packetization-supported=(string)DH, a-rtppayload-supported=(string)DH, a-framerate=(string)15.000000, a-recvonly=(string)"", ssrc=(uint)160300835, clock-base=(uint)272732435, seqnum-base=(uint)50014, npt-start=(guint64)0, play-speed=(double)1, play-scale=(double)1

There are two corrections to my initial post:

  • The frames from the rtspsrc do not actually arrive in bursts. The rtp packets do, but as soon as they are depayed it’s a steady stream.
  • We noticed that when we set the i-frame interval to a larger value like 75 (5 seconds at 15FPS), the decoder bottleneck seems to disappear. Our current i-frame setting is 15 (one i-frame a second) and we want to keep it this way.

An extra observation:
When we reduce the size of the second queue (the queue that drops frames after the decoder) to 2 instead of 20, we only get about 2-3FPS inference instead of the actual 15FPS. I assume this happens because with a small queue the nvinfer is waiting on the output of the decoder, whereas with a larger queue nvinfer can keep taking frames ensuring 15FPS inference.

If you think the nvv4l2decoder is the bottleneck, you may test the decoding performance with a simple pipelien with decoder and fpsdisplaysink.

rtspsrc → rtph264depay → nvv4l2decoder-> fpsdisplaysink

Thanks, with this pipeline we have the following observations:

  • The pipeline you suggested runs at 15FPS without issues, there is no delay or lower FPS.
  • When I introduce a queue that drops frames: “queue max-size-buffers=2 max-size-time=0 max-size-bytes=0 leaky=2” after the decoder, the FPS drops to 1FPS. When the i-frame interval is set to 75 instead of 15, the FPS stays at 15 instead of dropping to 1FPS.
  • When I set the queue size to max 30 buffers instead of 2, we reach 15FPS again.
  • However, when the inferencing components are reintroduced (nvstreammux, nvinfer, nvstreamdemux, nvosd), a delay starts slowly accumulating in the rtspsrc and frames are never dropped in the dropqueue.

My questions are then:

  • How could the queue with max buffer size 2 drop so many decoded frames after being introduced after the decoder, reducing the framerate to 1 FPS? It does not do this for other camera streams and it does not do this for longer i-frame intervals.
  • Increasing the queue size to 30 seemingly makes the problem go away, but actually makes frames slowly pile up before the decoder (about 2 minutes of delay over 5 days times). If the decoder is not the bottleneck (we expect the inference to be the bottleneck), why are these frames not processed by the decoder and then dropped in the queue if the inference is too slow?

I can not reproduce the issue you met with our own rtps source.

gst-launch-1.0 rtspsrc location=rtsp://xxxxxxx ! rtph264depay ! nvv4l2decoder ! queue max-size-buffers=2 max-size-time=0 max-size-bytes=0 leaky=2 ! fpsdisplaysink video-sink=fakesink fps-update-interval=1000 signal-fps-measurements=TRUE
Setting pipeline to PAUSED …
0:00:00.036562923 914 0x56127ca4cd60 DEBUG fpsdisplaysink fpsdisplaysink.c:440:fps_display_sink_start: Use text-overlay? 1
Pipeline is live and does not need PREROLL …
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp://xxxxxxxx
Progress: (open) Retrieving server options
Progress: (open) Retrieving media info
Progress: (request) SETUP stream 0
Progress: (open) Opened Stream
Setting pipeline to PLAYING …
New clock: GstSystemClock
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
0:00:07.184726133 914 0x56127ca74300 DEBUG fpsdisplaysink fpsdisplaysink.c:372:display_current_fps: Updated max-fps to 1.166273
0:00:07.184788314 914 0x56127ca74300 DEBUG fpsdisplaysink fpsdisplaysink.c:376:display_current_fps: Updated min-fps to 1.166273
0:00:07.184814345 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:1.166273 droprate:0.000000 avg-fps:1.166273
0:00:08.217022979 914 0x56127ca74300 DEBUG fpsdisplaysink fpsdisplaysink.c:372:display_current_fps: Updated max-fps to 6.780963
0:00:08.217074111 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:6.780963 droprate:0.000000 avg-fps:3.276103
0:00:09.249183073 914 0x56127ca74300 DEBUG fpsdisplaysink fpsdisplaysink.c:372:display_current_fps: Updated max-fps to 30.034325
0:00:09.249239508 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:30.034325 droprate:0.000000 avg-fps:10.583919
0:00:10.250133448 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.971667 droprate:0.000000 avg-fps:14.643543
0:00:11.251001398 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:28.974606 droprate:0.000000 avg-fps:17.124649
0:00:12.266281890 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:18.714105 droprate:0.000000 avg-fps:17.362089
0:00:13.286279881 914 0x56127ca74300 DEBUG fpsdisplaysink fpsdisplaysink.c:372:display_current_fps: Updated max-fps to 30.392076
0:00:13.286325717 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:30.392076 droprate:0.000000 avg-fps:19.062435
0:00:14.287257036 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:19.980651 droprate:0.000000 avg-fps:19.166673
0:00:15.288276452 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.969280 droprate:0.000000 avg-fps:20.268040
0:00:16.289236155 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.971276 droprate:0.000000 avg-fps:21.165739
0:00:17.290234023 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:27.972457 droprate:0.000000 avg-fps:21.742153
0:00:18.291276174 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.968438 droprate:0.000000 avg-fps:22.384434
0:00:19.292289693 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.969867 droprate:0.000000 avg-fps:22.933764
0:00:20.293252645 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.970810 droprate:0.000000 avg-fps:23.408952
0:00:21.294278404 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.969287 droprate:0.000000 avg-fps:23.823947
0:00:22.295259586 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.970328 droprate:0.000000 avg-fps:24.189612
0:00:23.296291059 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.969286 droprate:0.000000 avg-fps:24.514164
0:00:24.297225619 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.971938 droprate:0.000000 avg-fps:24.804321
0:00:25.298217610 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.970166 droprate:0.000000 avg-fps:25.065108
0:00:26.299230117 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.969662 droprate:0.000000 avg-fps:25.300810
0:00:27.300229733 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.970229 droprate:0.000000 avg-fps:25.514918
0:00:28.301291272 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.968430 droprate:0.000000 avg-fps:25.710184
0:00:29.302346853 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.968303 droprate:0.000000 avg-fps:25.889042
0:00:30.303343389 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.969707 droprate:0.000000 avg-fps:26.053529
0:00:31.304389540 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.968931 droprate:0.000000 avg-fps:26.205243
0:00:32.305374348 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.970248 droprate:0.000000 avg-fps:26.345682
0:00:33.306383275 914 0x56127ca74300 LOG fpsdisplaysink fpsdisplaysink.c:380:display_current_fps: Signaling measurements: fps:29.970228 droprate:0.000000 avg-fps:26.476019
^Chandling interrupt.
Interrupt: Stopping pipeline …
Execution ended after 0:00:28.405835175
Setting pipeline to NULL …
Freeing pipeline …

So the problem may related to the stream generated by Dahua camera. Please check the timestamp of the frames generated with Dahua camera.

The problem does seem to be uniquely linked to the Dahua camera stream.
In the end, I circumvented the issue by introducing a probe after the decoder that checks if there are frames piling up in a queue before the decoder, and as soon as a certain amount of frames are reached frames are dropped in the probe (after the decoder).
This way only decoded frames are dropped and no delay is introduced.
I tested this solution and it seems to work fine.
Thanks for the help.