Deadlock in gst_element_release_request_pad with nvstreammux

notmuchtotell · October 1, 2019, 4:54pm

I found a deadlock when I try to dynamically remove a stream after it has completed. In the deepstream-nvof-test app in Deepstream SDK 4.0, I modified the bus_call’s GST_MESSAGE_ELEMENT handling as follows:

case GST_MESSAGE_ELEMENT:
    {
      if (gst_nvmessage_is_stream_eos (msg)) {
        guint stream_id;
        if (gst_nvmessage_parse_stream_eos (msg, &stream_id)) {
          g_print ("Got EOS from stream %d\n", stream_id);
          GstElement *uri_decode_bin = gst_bin_get_by_name(GST_BIN(pipeline), "uri-decode-bin");
          gst_element_set_state (uri_decode_bin, GST_STATE_NULL);
          GstElement *streammux = gst_bin_get_by_name(GST_BIN(pipeline), "stream-muxer");
          gchar pad_name[16] = { };
          g_snprintf (pad_name, 15, "sink_%u", stream_id);
          GstPad *sinkpad;
          sinkpad = gst_element_get_static_pad (streammux, pad_name);
          if (!sinkpad) {
            g_printerr ("Streammux request sink pad failed. Exiting.\n");
            return -1;
          }
          g_print ("Before release\n");
          gst_element_release_request_pad (streammux, sinkpad);
          g_print ("After release\n");
        }
      }
      break;
    }

The pipeline element pointer will need to be moved to the global scope at the top of the file to get this to compile. I’m running the program with one input. “Before release” is printed, but “After release” is not. When I do a stack trace I get:

#0 0x00007f5db2cf3839 in syscall () at /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f5db35fd77f in g_cond_wait () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007f5db1d24ed0 in gst_nvstreammux_release_pad () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so
#3 0x000056198d2cfbba in bus_call ()
#4 0x00007f5db3b1acbd in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#5 0x00007f5db35b7285 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007f5db35b7650 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x00007f5db35b7962 in g_main_loop_run () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#8 0x000056198d2d0652 in main ()

Clearly, this thread is stuck waiting for a condition variable inside of gst_nvstreammux_release_pad. Any ideas on this? Are there any ways to work around it?

I tried pausing both the streammux as well as the whole pipeline before releasing the pad, but that didn’t help either.

Matt-G · October 1, 2019, 9:35pm

If you install the debug symbols for glib, you can at least see on what it’s blocking - whether a condition variable or mutex.

Anyway, I wonder if unlinking is needed, before releasing the request pad. I don’t see anything about that, in the docs.

[url]GstElement

There’s also something called pad blocking, which I’m guessing might be accomplished by installing a probe on an upstream pad:

[url]GstPad

Anyway, I got that clue from a rather unhelpful “design document” on Dynamic Pipelines:

https://gstreamer.freedesktop.org/documentation/additional/design/dynamic.html?gi-language=c

BTW, regarding “streaming vs. application threads”, your bus message handling thread is an application thread. So, it’s not an obvious deadlock, like trying to re-acquire a non-recursive mutex. In fact, the whole point of the bus is to decouple message processing from the streaming threads.

DaneLLL · October 2, 2019, 2:50am

Hi,
Please try DS4.0.1
For more information, do you use Jetson platforms or PCs with NVIDIA GPUs?

notmuchtotell · October 2, 2019, 6:29am

I’ve now tried it with DS4.0.1 and it has the same issue. I’m using a GCP VM with a T4 right now.

I’m looking into blocking pads and I don’t think that is exactly my problem. I’ve been reading through the info about dynamically adding and removing streams in the gstreamer documentation:
[url]Pipeline manipulation

notmuchtotell · October 2, 2019, 7:35pm

Here is the backtrace with debug symbols for glib:

#0  0x00007f99b5770839 in syscall () at /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f99b607a77f in g_cond_wait (cond=0x5635f8bcec50, mutex=0x5635f8bcec48) at ../../../../glib/gthread-posix.c:1402
#2  0x00007f99b47a1f72 in gst_nvstreammux_release_pad () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so
#3  0x00005635f6ef2c57 in bus_call ()
#4  0x00007f99b6597d6d in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#5  0x00007f99b6034285 in g_main_dispatch (context=0x5635f88f88a0) at ../../../../glib/gmain.c:3176
#6  0x00007f99b6034285 in g_main_context_dispatch (context=context@entry=0x5635f88f88a0) at ../../../../glib/gmain.c:3829
#7  0x00007f99b6034650 in g_main_context_iterate (context=0x5635f88f88a0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../../glib/gmain.c:3902
#8  0x00007f99b6034962 in g_main_loop_run (loop=0x5635f88f8b50) at ../../../../glib/gmain.c:4098
#9  0x00005635f6ef3700 in main ()

The only other thread that isn’t waiting on things is the stream-muxer:sr thread. It seems to be trying to process buffers still even though there aren’t any coming in. I see it doing various things like unreffing GstBuffers, collecting buffers (gst_nvstreammux_src_collect_buffers), and creating batch metadata.

Matt-G · October 2, 2019, 7:44pm

Waiting on a condition variable is annoying, especially when we don’t have the source to gst_nvstreammux_release_pad(), because we can’t tell which condition variable it is or who is supposed to signal it. If it were a mutex, then you could actually inspect its state to see which thread currently owns it.

So, I think you’re pretty near as far as you can take it. Maybe provide some more details of your pipeline and how you triggered the problem, so that Nvidia can hopefully reproduce & fix it.

notmuchtotell · October 2, 2019, 10:56pm

I think that I’ve included everything needed to reproduce this in my previous posts. The only other change from the example code is that I changed the final sink to a testsink because my GPU isn’t attached to a screen. This can also be reproduced with deepstream-test3 which is a little simpler than deepstream-nvof-test.

Here are some other things that I’ve tried:

unlinking the pads before trying to release (has to be done before setting NULL state on the source because it releases its pad.).
using gst_pad_set_active on nvstream’s sink pad for this stream.
not setting null on the source before trying to release the pad.
unreffing nvstreammux’s sink pad before trying to release.
pausing nvstreammux before releasing the pad.

None of them have worked.

I also tried debugging using GST_DEBUG at various levels. It looks like even when the source has been removed, the streammux is stuck in a loop of creating and removing buffers; I’m not sure what is happening with the buffers:

0:00:31.183856522  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:129:gst_buffer_list_init: init 0x7f752800fd60
0:00:31.183869027  6846 0x7f752000a0a0 LOG              nvstreammux gstnvstreammux.c:1464:gst_nvstreammux_src_collect_buffers:<stream-muxer> SETTING CUDA DEVICE = 0 in nvstreammux  func=gst_nvstreammux_src_collect_buffers

0:00:31.183883289  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:441:gst_mini_object_unref: 0x7f752800fd60 unref 1->0
0:00:31.183894286  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:104:_gst_buffer_list_free: free 0x7f752800fd60
0:00:31.183902628  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:161:gst_buffer_list_new_sized: new 0x7f752800fe40
0:00:31.183912836  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:129:gst_buffer_list_init: init 0x7f752800fe40
0:00:31.183924519  6846 0x7f752000a0a0 TRACE            GST_LOCKING gstminiobject.c:239:gst_mini_object_unlock: unlock 0x7f7528009600: state 00010101, access_mode 1
0:00:31.183933353  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:441:gst_mini_object_unref: 0x7f7528009600 unref 78->77
0:00:31.183944745  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:441:gst_mini_object_unref: 0x7f752033a3d0 unref 1->0
0:00:31.183955296  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:355:gst_mini_object_ref: 0x7f752033a3d0 ref 0->1
0:00:31.183966682  6846 0x7f752000a0a0 LOG               GST_BUFFER gstbuffer.c:710:_gst_buffer_dispose: release 0x7f752033a3d0 to pool 0x7f7528023990
0:00:31.183978608  6846 0x7f752000a0a0 DEBUG             GST_BUFFER gstbuffer.c:2384:gst_buffer_foreach_meta: remove metadata 0x7f7508420c68 (NvDsMeta)
0:00:31.183995909  6846 0x7f752000a0a0 LOG               bufferpool gstbufferpool.c:1284:default_release_buffer:<nvstreammuxbufferpool0> released buffer 0x7f752033a3d0 0
0:00:31.184007178  6846 0x7f752000a0a0 DEBUG             GST_BUFFER gstbuffer.c:1375:gst_buffer_is_memory_range_writable: idx 0, length -1
0:00:31.184018485  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstobject.c:264:gst_object_unref:<nvstreammuxbufferpool0> 0x7f7528023990 unref 2->1
0:00:31.184030174  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:441:gst_mini_object_unref: 0x7f752800fe40 unref 1->0
0:00:31.184038791  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:104:_gst_buffer_list_free: free 0x7f752800fe40
0:00:31.184050953  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:161:gst_buffer_list_new_sized: new 0x7f752800fe40
0:00:31.184060595  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:129:gst_buffer_list_init: init 0x7f752800fe40
0:00:31.184071943  6846 0x7f752000a0a0 LOG               bufferpool gstbufferpool.c:1128:default_acquire_buffer:<nvstreammuxbufferpool0> acquired buffer 0x7f752033a2c0
0:00:31.184083862  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstobject.c:237:gst_object_ref:<nvstreammuxbufferpool0> 0x7f7528023990 ref 1->2
0:00:31.184108687  6846 0x7f752000a0a0 DEBUG             GST_BUFFER gstbuffer.c:2202:gst_buffer_add_meta: alloc metadata 0x7f7508012308 (NvDsMeta) of size 72
0:00:31.184125022  6846 0x7f752000a0a0 LOG               GST_BUFFER gstbuffer.c:1721:gst_buffer_map_range: buffer 0x7f752033a2c0, idx 0, length -1, flags 0001
0:00:31.184136736  6846 0x7f752000a0a0 LOG               GST_BUFFER gstbuffer.c:213:_get_merged_memory: buffer 0x7f752033a2c0, idx 0, length 1
0:00:31.184147577  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:355:gst_mini_object_ref: 0x7f7528009540 ref 78->79
0:00:31.184159457  6846 0x7f752000a0a0 TRACE            GST_LOCKING gstminiobject.c:179:gst_mini_object_lock: lock 0x7f7528009540: state 00010000, access_mode 1
0:00:31.184171525  6846 0x7f752000a0a0 LOG              nvstreammux gstnvstreammux.c:1464:gst_nvstreammux_src_collect_buffers:<stream-muxer> SETTING CUDA DEVICE = 0 in nvstreammux  func=gst_nvstreammux_src_collect_buffers

0:00:31.184187731  6846 0x7f752000a0a0 TRACE        GST_REFCOUNTING gstminiobject.c:441:gst_mini_object_unref: 0x7f752800fe40 unref 1->0
0:00:31.184196376  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:104:_gst_buffer_list_free: free 0x7f752800fe40
0:00:31.184248810  6846 0x7f752000a0a0 LOG          GST_BUFFER_LIST gstbufferlist.c:161:gst_buffer_list_new_sized: new 0x7f752800fd60

Is it possible to open source at least the nvstreammux code? I can probably fix this if I can look at the code. If not I may end up writing my own version of nvstreammux if this can’t be fixed soon, but I’d like to avoid that if possible.

DaneLLL · October 3, 2019, 2:07am

Hi,
nvstreammux plugin is not open source. Please share a patch on deepstream-test3 so that we can reproduce the deadlock.

notmuchtotell · October 3, 2019, 7:15am

Yes, I’m aware that nvstreammux is not currently open source.

Here is a patch for deepstream-test3:

69a70,71
> GstElement *pipeline = NULL;
> 
179c181,195
<         }
---
>           GstElement *uri_decode_bin = gst_bin_get_by_name(GST_BIN(pipeline), "uri-decode-bin");
>           gst_element_set_state (uri_decode_bin, GST_STATE_NULL);
>           GstElement *streammux = gst_bin_get_by_name(GST_BIN(pipeline), "stream-muxer");
>           gchar pad_name[16] = { };
>           g_snprintf (pad_name, 15, "sink_%u", 0);
>           GstPad *sinkpad;
>           sinkpad = gst_element_get_static_pad (streammux, pad_name);
>           if (!sinkpad) {
>             g_printerr ("Streammux request sink pad failed. Exiting.\n");
>             return -1;
> 	  }
>           g_print ("Before release\n");
>           gst_element_release_request_pad (streammux, sinkpad);
>           g_print ("After release\n");
> 	}
286c302
<   GstElement *pipeline = NULL, *streammux = NULL, *sink = NULL, *pgie = NULL,
---
>   GstElement *streammux = NULL, *sink = NULL, *pgie = NULL,
373c389
<   sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
---
>   sink = gst_element_factory_make ("testsink", "nvvideo-renderer");

Also, I’d always been running this with only one stream(file). I tried running the patched version with two different files both as input uris. The shorter stream finished first and there wasn’t a deadlock in the releasing of the pad. However, the other file stopped processing right as the release happened for the shorter stream.

If I run the same two files on the original deepstream-test3, the streams keep going.

Do I need to block the source pad before shutting it down even though the stream has already sent a EOS downstream?

notmuchtotell · October 3, 2019, 5:06pm

I tried adding a blocking probe to the various pads (uridecodebin’s src_0 pad, nvstreammux’s sink_0, the peer of nvstreammux’s sink_0) when the EOS for the element is detected, but the probe callback is never invoked. I’m guessing that this is because the stream has already ended so that isn’t the way to go. The blocking probe method appears to be more for switching the middle elements of a stream that hasn’t ended.

DaneLLL · October 4, 2019, 7:22am

Hi,
Please share a patch in this format so that we can apply it through ‘$ patch -p1 < test.patch’. Not familiar with this format. Sorry for this.
Or you may simply zip deepstream_test3_app.c and attach it.

Besides, we have to launch 2+ sources to reproduce it, right?

notmuchtotell · October 4, 2019, 7:30pm

Sorry, I forgot to add -u to my diff command to create the patch. Here’s the patch:

--- deepstream_test3_app.c	2019-09-11 09:49:43.000000000 +0000
+++ /code/deepstream_test3_app.c	2019-10-03 16:25:09.258070943 +0000
@@ -67,6 +67,8 @@
 /* tiler_sink_pad_buffer_probe  will extract metadata received on OSD sink pad
  * and update params for drawing rectangle, object information etc. */
 
+GstElement *pipeline = NULL;
+
 static GstPadProbeReturn
 tiler_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
     gpointer u_data)
@@ -134,14 +136,37 @@
     return GST_PAD_PROBE_OK;
 }
 
+GstPadProbeReturn
+pad_probe_cb (GstPad * pad, GstPadProbeInfo * info, gpointer user_data) {
+  g_print("in pad probe callback");
+  gst_pad_remove_probe (pad, GST_PAD_PROBE_INFO_ID (info));
+  
+  GstElement *uri_decode_bin = gst_bin_get_by_name(GST_BIN(pipeline), "uri-decode-bin");
+  gst_element_set_state (uri_decode_bin, GST_STATE_NULL);
+
+  GstElement *streammux = gst_bin_get_by_name(GST_BIN(pipeline), "stream-muxer");
+  gchar pad_name[16] = { };
+  g_snprintf (pad_name, 15, "sink_%u", 0);
+  GstPad *sinkpad;
+  sinkpad = gst_element_get_static_pad (streammux, pad_name);
+  if (!sinkpad) {
+    g_printerr ("Streammux request sink pad failed. Exiting.\n");
+    return -1;
+  }
+  g_print ("Before release\n");
+  gst_element_release_request_pad (streammux, sinkpad);
+  g_print ("After release\n");
+  gst_object_unref(sinkpad);
+}
+
 static gboolean
 bus_call (GstBus * bus, GstMessage * msg, gpointer data)
 {
   GMainLoop *loop = (GMainLoop *) data;
   switch (GST_MESSAGE_TYPE (msg)) {
     case GST_MESSAGE_EOS:
-      g_print ("End of stream\n");
-      g_main_loop_quit (loop);
+      //g_print ("End of stream\n");
+      //g_main_loop_quit (loop);
       break;
     case GST_MESSAGE_WARNING:
     {
@@ -176,7 +201,26 @@
         guint stream_id;
         if (gst_nvmessage_parse_stream_eos (msg, &stream_id)) {
           g_print ("Got EOS from stream %d\n", stream_id);
-        }
+          GstElement *streammux = gst_bin_get_by_name(GST_BIN(pipeline), "stream-muxer");
+          gchar pad_name[16] = { };
+          g_snprintf (pad_name, 15, "sink_%u", 0);
+          GstPad *sinkpad;
+          sinkpad = gst_element_get_static_pad (streammux, pad_name);
+          if (!sinkpad) {
+            g_printerr ("Streammux request sink pad failed. Exiting.\n");
+            return -1;
+	  }
+	  GstPad *srcpad = gst_pad_get_peer(sinkpad);
+	  gst_object_unref(sinkpad);
+	  if (!srcpad) {
+            g_printerr ("Peer request source pad failed. Exiting.\n");
+            return -1;
+	  }
+            
+	  gst_pad_add_probe (srcpad, GST_PAD_PROBE_TYPE_BLOCK_DOWNSTREAM,
+      	      pad_probe_cb, NULL, NULL);
+	  gst_object_unref(srcpad);
+	}
       }
       break;
     }
@@ -283,7 +327,7 @@
 main (int argc, char *argv[])
 {
   GMainLoop *loop = NULL;
-  GstElement *pipeline = NULL, *streammux = NULL, *sink = NULL, *pgie = NULL,
+  GstElement *streammux = NULL, *sink = NULL, *pgie = NULL,
       *nvvidconv = NULL, *nvosd = NULL, *tiler = NULL;
 #ifdef PLATFORM_TEGRA
   GstElement *transform = NULL;
@@ -370,7 +414,7 @@
 #ifdef PLATFORM_TEGRA
   transform = gst_element_factory_make ("nvegltransform", "nvegl-transform");
 #endif
-  sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
+  sink = gst_element_factory_make ("testsink", "nvvideo-renderer");
 
   if (!pgie || !tiler || !nvvidconv || !nvosd || !sink) {
     g_printerr ("One element could not be created. Exiting.\n");

Also, the deadlock is seen by playing a single file. Playing two files causes the longer file to stop being processed after the first one is finished and the pad is released.

DaneLLL · October 7, 2019, 10:17am

Hi,
It seems not right if you do not call g_main_loop_quit (loop); when EOS is received. Maybe it is the reason of deadlock?

notmuchtotell · October 7, 2019, 4:39pm

It isn’t the EOS message change in the above. Here is one with even fewer changes. That other one had my attempt at adding a blocking pad in it. This one is the least number of changes to reproduce it.

--- deepstream_sdk_v4.0.1_x86_64/sources/apps/sample_apps/deepstream-test3/deepstream_test3_app.c	2019-09-11 03:49:43.000000000 -0600
+++ deepstream_test3_app.c	2019-10-07 10:27:00.000000000 -0600
@@ -63,6 +63,7 @@
 //static struct timeval start_time = { };

 //static guint probe_counter = 0;
+GstElement *pipeline = NULL;

 /* tiler_sink_pad_buffer_probe  will extract metadata received on OSD sink pad
  * and update params for drawing rectangle, object information etc. */
@@ -176,6 +177,18 @@
         guint stream_id;
         if (gst_nvmessage_parse_stream_eos (msg, &stream_id)) {
           g_print ("Got EOS from stream %d\n", stream_id);
+          GstElement *streammux = gst_bin_get_by_name(GST_BIN(pipeline), "stream-muxer");
+          gchar pad_name[16] = { };
+          g_snprintf (pad_name, 15, "sink_%u", 0);
+          GstPad *sinkpad;
+          sinkpad = gst_element_get_static_pad (streammux, pad_name);
+          if (!sinkpad) {
+            g_printerr ("Streammux request sink pad failed. Exiting.\n");
+            return -1;
+	  }
+          g_print ("Before release\n");
+	  gst_element_release_request_pad (streammux, sinkpad);
+          g_print ("After release\n");
         }
       }
       break;
@@ -283,7 +296,7 @@
 main (int argc, char *argv[])
 {
   GMainLoop *loop = NULL;
-  GstElement *pipeline = NULL, *streammux = NULL, *sink = NULL, *pgie = NULL,
+  GstElement *streammux = NULL, *sink = NULL, *pgie = NULL,
       *nvvidconv = NULL, *nvosd = NULL, *tiler = NULL;
 #ifdef PLATFORM_TEGRA
   GstElement *transform = NULL;
@@ -370,7 +383,7 @@
 #ifdef PLATFORM_TEGRA
   transform = gst_element_factory_make ("nvegltransform", "nvegl-transform");
 #endif
-  sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
+  sink = gst_element_factory_make ("testsink", "nvvideo-renderer");

   if (!pgie || !tiler || !nvvidconv || !nvosd || !sink) {
     g_printerr ("One element could not be created. Exiting.\n");

My whole goal with this is to be able to keep the stream running forever and keep adding and removing sources. I’m trying to save myself from having to reload the model. I want to make a deepstream pipeline that I can pass various video clips into on-demand and out of which I can get inference results. Most of the examples in deepstream right now are more for having a fixed number of streams that are always up like for surveillance at a grocery store. However, that is not my use case. I have lots of video clips that I want to run through an inferrer. These clips are being uploaded frequently and I want to pass them through the pipeline as they are received. Deepstream looks like a great candidate for this because it has elements for demuxing, parsing, and decoding as well as inference. If this isn’t considered a valid use case for Deepstream, I urge Nvidia to add it.

DaneLLL · October 8, 2019, 1:29am

Hi,
It is similar request as
[url]https://devtalk.nvidia.com/default/topic/1064141/deepstream-sdk/adding-and-removing-streams-during-runtime/post/5390153/#5390153[/url]

We are checking and will update.

notmuchtotell · October 11, 2019, 4:20pm

I thought that I’d post another finding. nvstreamdemux needs to be in a Null state to add sink request pads. nvstreammux doesn’t have this requirement. This is separate from the above issue, but something that is a problem if a user wants to split streams out after doing inference for instance.

notmuchtotell · October 15, 2019, 6:52pm

Here are some more findings:

I updated the reference anomaly app to get it working with version DS 4.0.1. It also deadlocks when it tries to release the pad. I haven’t tried going back to DS 3. Does anyone know if it deadlocks during the release of the nvstream pad back in DS 3?

I tried a different tack that would work for my use case. I tried setting up N number of streams and then tearing down everything except nvinfer. I’ve tried pausing nvinfer, but it seems to go to the null state anyway. My whole goal was to keep nvinfer from loading the model again, but that didn’t work out. Also, the pipeline seems to get stuck when I try to send through the second set of streams. Then again, if nvinfer is reloading the model anyway, I may as well destroy that element and recreate it.

Has anyone found any way to work around these issues? All of my video input streams are of the same length so I’m fine with waiting until a batch of streams are done to move on to the next batch, but I really want to avoid reloading the model. I’ll keep trying different things to work around this.

notmuchtotell · October 24, 2019, 6:06pm

I’ve made some progress with a workaround for dynamically adding and removing streams. My particular use case involves a nvstreammux → nvinfer → nvstreamdemux middle part of the pipeline. Here are some findings:

All of the request pads need to be set up at the beginning.
The request pads must never be released as part of adding and removing streams. Rather they are linked and unlinked.
When a stream is finished as detected by using gst_nvmessage_is_stream_eos and gst_nvmessage_parse_stream_eos in the message bus when an element is found, start the tear down.
Tearing down involves pausing everything that is running, setting the state of the source and sink that we want to remove to the Null state and then unlinking and removing from the pipeline.
For adding streams it seems to work better to pause anything that is playing in the pipeline first.
Be sure to use a lock to control the manipulation of the pipeline so that only one thread is modifying it at a time. I lock it for all of the add code, and for all of the remove code.

I’m stilling having some issues though. With two streams being added/removed asynchronously, it works for like 12-15 streams added/removed and then errors out with:

0:00:21.177111588  2961 0x55b338a39e30 WARN                 nvinfer gstnvinfer.cpp:1830:gst_nvinfer_output_loop:<pgie> error: Internal data stream error.
0:00:21.177621422  2961 0x55b338a39e30 WARN                 nvinfer gstnvinfer.cpp:1830:gst_nvinfer_output_loop:<pgie> error: streaming stopped, reason error (-5)

I haven’t found anything in the logs that gives more insight into this error. Can someone from Nvidia let us know what that error means? I have also seen occasionally NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY. The GPU isn’t running out of memory.

However, my code works fine if I limit it to only one stream at a time. For now that looks like what I’ll have to do. At least with this way I don’t have to reload the model. I will try queuing up other sources into the playing state, but unlinked. Then I will link up the new source ASAP after the previous source is torn down. I’m don’t yet know how much this will slow down throughput, but it seems my best option for now.

One other thought that has occurred to me is that there might be a problem with having a request pad that isn’t linked when the middle elements are in the playing state. I’ll investigate this if I find time to do so.

DaneLLL · November 19, 2019, 8:20am

This should be resolved:
https://devtalk.nvidia.com/default/topic/1064141/deepstream-sdk/adding-and-removing-streams-during-runtime/post/5400986/#5400986

CJR · November 22, 2019, 10:52pm

Hi @notmuchtotell,

We have reproduced the deadlock issue in streammux release pad function and this will be resolved in a future release. Thank you for sharing your analysis on the problem.