DeepStream pipeline blocks when queueing video buffers

Hello,

I am trying to develop a custom deepstream plugin, it can keep several video buffers inside it to do some async network communications. It works like gstreamer queue with min-threshold-buffers property set. I think Gst-nvinferserver also have similiar mechanism.

I have tested it on a gstreamer machine without deepstream SDK, it works well. But the similiar pipeline blocks when I turn to the deepstream platform.

Running this command below can observe the problem, it happens only if I put the queue after nvv4l2decoder plugin.

gst-launch-1.0 filesrc location=[THE LOCAL H264 MP4 FILE PATH] ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! nvv4l2decoder ! queue min-threshold-buffers=10 ! nvegltransform ! nveglglessink

Thank you in advance.

• Hardware Platform (Jetson)
• DeepStream Version 5.0
• JetPack Version 4.4
• TensorRT Version 7.1.3-1

  1. You may misunderstand Gst-nvinferserver, it does not bufer any extra GstBuffer inside it. It does not work in the way you think.
  2. nvv4l2decoder is HW based video decoder, it has only a bufferpool of size 4. When you set queue min-threshold-buffers=10, the HW decoder does not have enough buffers to fill the queue. So the pipeline is blocked. You can limit the queue size to let the pipeline work.

gst-launch-1.0 --gst-debug=v4l2videodec:5 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! nvv4l2decoder ! queue min-threshold-buffers=10 max-size-buffers=2 ! nvegltransform ! nveglglessink

The “max-size-buffers” value can be 1~4.

Another way is to increase decoder buffers, but this will increase the system memory consumption too. You need to judge according to your requirement and the whole situation of your system.
gst-launch-1.0 filesrc location= gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! nvv4l2decoder num-extra-surfaces=7 ! queue min-threshold-buffers=10 ! nvegltransform ! nveglglessink

The value of num-extra-surfaces is calculated by 10 - 4 + 1 =7

Thank you Ms. Chen, your answer saved my day. I tested the pipeline according to your suggestion and it works fine.

Actually we do have some Triton servers, and planning to use the plugin Gst-nvinferserver instead of homemade plugin later, I wish I can get more information about its mechanism.

Now I know nvinferserver will not keep any buffer, but according to its doc page (https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinferserver.html):

The Gst-nvinferserver plugin passes the input batched buffers to the low-level library and waits for the results to be available. Meanwhile, it keeps queuing input buffers to the low-level library as they are received.

…How will it work? It seems nvinferserver still needs to keep (a reference of) each buffer until getting the result response from the server. In other words, it keeps a internal queue to store the to-be-remote-inferred buffers.
Or do it only works in serial, which means, it will accept a new buffer only if it pushes out one old buffer.

The buffer mentioned in the description is not GstBuffer but frame buffer. " the input batched buffers" is only one GstBuffer. gst-nvinferserver handles the input batch by batch but not frame by frame. This is Nvidia defined data structure, and this is the reason why nvstreammux is needed for DeepStream.
Gst-nvstreammux — DeepStream DeepStream Version: 5.0 documentation (nvidia.com)