Releasing nvstreammux request pad results in a deadlock

• Hardware Platform (Jetson / GPU)
Jetson
• DeepStream Version
5.0
• JetPack Version (valid for Jetson only)
4.4
• TensorRT Version
7.1

• Issue Type( questions, new requirements, bugs)
Questions, possible bug

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

This issue seems to be identical to Deadlock in gst_element_release_request_pad with nvstreammux, which has been fixed, according to the Issue.

I’m running into the same problem with dynamic source removal, using the code from the deepstream_reference_apps/runtime_source_add_delete at master · NVIDIA-AI-IOT/deepstream_reference_apps · GitHub.

The difference In my application is that I’m using RTSP sources, instead of URI sources.

First, the solution works properly as long as the first source added successfully connects. I can then add and remove additional sources without any problems.

However, if the first source fails to connect due to a socket timeout, unlinking the source and releasing the requested sink pad will result in deadlock.

I’m able to use the following code snippet, from your example in my application

static void
stop_release_source (gint source_id)
{
  GstStateChangeReturn state_return;
  gchar pad_name[16];
  GstPad *sinkpad = NULL;
  state_return =
      gst_element_set_state (g_source_bin_list[source_id], GST_STATE_NULL);
  switch (state_return) {
    case GST_STATE_CHANGE_SUCCESS:
      g_print ("STATE CHANGE SUCCESS\n\n");
      g_snprintf (pad_name, 15, "sink_%u", source_id);
      sinkpad = gst_element_get_static_pad (streammux, pad_name);
      gst_pad_send_event (sinkpad, gst_event_new_flush_stop (FALSE));
      gst_element_release_request_pad (streammux, sinkpad);
      g_print ("STATE CHANGE SUCCESS %p\n\n", sinkpad);
      gst_object_unref (sinkpad);
      gst_bin_remove (GST_BIN (pipeline), g_source_bin_list[source_id]);
      source_id--;
      g_num_sources--;
      break;

with deadlock occurring at

gst_element_release_request_pad (streammux, sinkpad);

Is something additional required to make the above work on a failed connection?

Thanks
Robert

2 Likes

Hey, does the issue persist when you try local input files?

@bcao, I’ve been unable to reproduce the issue using local files, as I’m unable to get a URI source to fail in the same way.

With the RTSP failure, I can see that the stream selection callback gets called… with the socket timeout occurring shortly after.

Trying to release the pad after socket timeout is when the deadlock occurs.

Ok, just to confirm, so you don’t modify any code of deepstream_reference_apps/runtime_source_add_delete at master · NVIDIA-AI-IOT/deepstream_reference_apps · GitHub and just replace the sources as RTSP, right?

@bcao give me a day and I will try and get you a simple repo script based on the example

Hi rjhowell44,

Any update? Is this still an issue to support?

have you solved this problem?

I’m having a similar issue and block on a pipeline teardown. The following gst-launch string reproduces the problem when trying to exit with CTRL+C:

gst-launch-1.0
uridecodebin name=source_0 uri=<some_rtsp_source_uri_0> ! queue name=source_queue_0 
uridecodebin name=source_1 uri=<some_rtsp_source_uri_1> ! queue name=source_queue_1 
uridecodebin name=source_2 uri=<some_rtsp_source_uri_2> ! queue name=source_queue_2 
uridecodebin name=source_3 uri=<some_rtsp_source_uri_3> ! queue name=source_queue_3 
uridecodebin name=source_4 uri=<some_rtsp_source_uri_4> ! queue name=source_queue_4 
uridecodebin name=source_5 uri=<some_rtsp_source_uri_5> ! queue name=source_queue_5 
uridecodebin name=source_6 uri=<some_rtsp_source_uri_6> ! queue name=source_queue_6 
uridecodebin name=source_7 uri=<some_rtsp_source_uri_7> ! queue name=source_queue_7 
uridecodebin name=source_8 uri=<some_rtsp_source_uri_8> ! queue name=source_queue_8 
uridecodebin name=source_9 uri=<some_rtsp_source_uri_9> ! queue name=source_queue_9
nvstreammux name=batcher live-source=1 width=1920 height=1080 batch-size=10 batched-push-timeout=4000000
source_queue_0. ! batcher.sink_0
source_queue_1. ! batcher.sink_1
source_queue_2. ! batcher.sink_2
source_queue_3. ! batcher.sink_3
source_queue_4. ! batcher.sink_4
source_queue_5. ! batcher.sink_5
source_queue_6. ! batcher.sink_6
source_queue_7. ! batcher.sink_7
source_queue_8. ! batcher.sink_8
source_queue_9. ! batcher.sink_9
batcher.
! nvmultistreamtiler rows=2 columns=5 width=1920 height=1080
! nvvideoconvert
! nvdsosd
! nvvideoconvert interpolation-method=1
! capsfilter caps="video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080"
! nvv4l2h264enc bitrate=10000000
! h264parse
! mpegtsmux
! filesink location=test.mp4

with

Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

And gdb trace:

(gdb) bt
#0  0x00007fd19d17d839 in syscall () at /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd19d70475f in g_cond_wait () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007fd19aa77c28 in gst_nvstreammux_release_pad ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so
#3  0x00007fd19dc38090 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#4  0x00007fd19d99dfa3 in g_object_unref () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#5  0x00007fd19dc16c17 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#6  0x00007fd19dc15c88 in gst_bin_remove () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7  0x00007fd19dc15f13 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#8  0x00007fd19d99dfa3 in g_object_unref () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#9  0x0000564dcce7aae7 in  ()
#10 0x00007fd19d083b97 in __libc_start_main () at /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000564dcce7b0da in  ()

My apologies to all for not being able to respond to this thread earlier…

@baco, @kayccc … as pointed out by @529683504 and @nkyriazis this is still an open issue.

My issue is similar as yours, if it still bother you, you can reference:Releasing nvstreammux request pad blocked when input an invalid rtsp url - #40 by Fiona.Chen

it looks like streammux cannot release pad if the batch empty

I purposefully devised a gst-launch snippet, as the minimal reproducing example that, among other things, also leaves very little room for the programmer to do it wrong, other than the gst-launch string itself. The streams are all working, and the pipeline was working before it was stopped. I feel that if this hangs, regardless of whether the blocking is a feature or a bug, it should be such that it doesn’t.

nvstreammux has frequently been the source of issues in our pipeline, especially due to its covert nature. If the batching algorithm and the blocking cases were documented it might be easier to use robustly. This does not change the initial point.

Any update?? I have same issues

We are still working on this issue.

Encountering the same issue and is causing us significant headaches. From reading the linked threads in the original post, it appears that this issue has existed for almost 2 years without resolution, perhaps isolated to remote URIs (e.g RTSP)!? Can someone from NVIDIA please give an update on the internal status of this issue?

@icetana as the original poster I can attest to the frustrating wait… one kludgy workaround is to test the url with openCV first.

cv::VideoCapture capture(url);

if (!capture->isOpened()) {
    //Error
}

Regards,
Robert.

I managed to fix this error by not performing delete operation in any probe function but to invoke stream delete operation function after handling EOS signal in bus call. Its working fine with RTSP streams. Tested it couple of times as well. Similarly I even tried to invoke delete function from separate thread which checks for health status of RTSP streams and delinks them from pipeline if RTSP stream goes down. It does works fine as well.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.