DS5.0 multistream: pipeline terminated yet the process didn't stop

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)

I’m running a program based on the Deepstream 5.0 sample 5 multistreaming, it’s to process multiple RTSP streams.

A problem I’m encountering is sometimes the pipeline ran into problem and terminated (log showed “Exiting app”), yet the process was still running (ps showed the pid).

def main(args):
    ...
    # start play back and listed to events              
    pipeline.set_state(Gst.State.PLAYING)
    try:
        loop.run()
    except:
        pass
    # cleanup
    print("Exiting app\n")
    pipeline.set_state(Gst.State.NULL)

if __name__ == '__main__':
    sys.exit(main(sys.argv))

According to above structure, when the log shows “Exiting app”, the next step is to set pipeline state to NULL, then function main() returns, then the whole process will exit. But from what I spotted, “Exiting app” showed in the log while the process was still running. Then I tried using gdb to attach to that process to see what was going on. Here is the result.

# gdb -p 24613
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 24613
[New LWP 24779]
[New LWP 24781]
[New LWP 24782]
[New LWP 24783]
[New LWP 24784]
[New LWP 24785]
[New LWP 24786]
[New LWP 24795]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f6a76349839 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f6a76349839 in syscall () at /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f6a7498075f in g_cond_wait () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f6a6cc27b62 in gst_nvstreammux_release_pad ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so
#3  0x00007f6a72aba090 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#4  0x00007f6a74c19fa3 in g_object_unref () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#5  0x00007f6a750a54f9 in  () at /usr/lib/python3/dist-packages/gi/_gi.cpython-36m-x86_64-linux-gnu.so
#6  0x000000000054f226 in  ()
#7  0x0000000000509ac5 in  ()
#8  0x000000000050a64d in  ()
#9  0x000000000050c1f4 in _PyEval_EvalFrameDefault ()
#10 0x0000000000507f24 in  ()
#11 0x000000000050b053 in PyEval_EvalCode ()
#12 0x0000000000634dd2 in  ()
#13 0x0000000000634e87 in PyRun_FileExFlags ()
#14 0x000000000063863f in PyRun_SimpleFileExFlags ()
#15 0x00000000006391e1 in Py_Main ()
#16 0x00000000004b0dc0 in main ()
(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7f6a7681b740 (LWP 24613) "python3" 0x00007f6a76349839 in syscall ()
   from /lib/x86_64-linux-gnu/libc.so.6
  2    Thread 0x7f6a53ae4700 (LWP 24779) "python3" 0x00007f6a76351237 in accept4 ()
   from /lib/x86_64-linux-gnu/libc.so.6
  3    Thread 0x7f6a41ae1700 (LWP 24781) "gmain" 0x00007f6a76342bf9 in poll ()
   from /lib/x86_64-linux-gnu/libc.so.6
  4    Thread 0x7f6a3bae0700 (LWP 24782) "python3" 0x00007f6a76342bf9 in poll ()
   from /lib/x86_64-linux-gnu/libc.so.6
  5    Thread 0x7f6a23adc700 (LWP 24783) "python3" 0x00007f6a7601c9f3 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  6    Thread 0x7f69ffad6700 (LWP 24784) "python3" 0x00007f6a76342bf9 in poll ()
   from /lib/x86_64-linux-gnu/libc.so.6
  7    Thread 0x7f69f9ad5700 (LWP 24785) "python3" 0x00007f6a7601c9f3 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  8    Thread 0x7f69dbad0700 (LWP 24786) "python3" 0x00007f6a7601f6d6 in do_futex_wait.constprop ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  9    Thread 0x7f699fac6700 (LWP 24795) "dconf worker" 0x00007f6a76342bf9 in poll ()
   from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 

It looks like the process was waiting on something (g_cond_wait) in gst_nvstreammux_release_pad().

The DS application log showed below information. Can anybody shed some light on this issue? What can I do to solve this issue or handle it gracefully?

0:00:38.979729030 ^[[332m24613^[[00m 0x1da2e30 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:3155:on_timeout:^[[00m source 9bc6803f, stream 9bc6803f in session 0 timed out
0:00:38.980248332 ^[[332m24613^[[00m 0x1da8370 ^[[33;01mWARN ^[[00m ^[[00m decodebin gstdecodebin2.c:4692:gst_decode_bin_expose:^[[00m All streams finished without buffers. Last group: 1
0:00:38.980279485 ^[[332m24613^[[00m 0x1da8370 ^[[33;01mWARN ^[[00m ^[[00m decodebin gstdecodebin2.c:4695:gst_decode_bin_expose:^[[00m error: all streams without buffers
2020-09-08 03:06:19,068 Exiting app

0:00:38.997386789 ^[[332m24613^[[00m 0x1da5230 ^[[33;01mWARN ^[[00m ^[[00m v4l2videodec gstv4l2videodec.c:1618:gst_v4l2_video_dec_decide_allocation:^[[00m Duration invalid, not setting latency
0:00:38.997420762 ^[[332m24613^[[00m 0x1da5230 ^[[33;01mWARN ^[[00m ^[[00m v4l2bufferpool gstv4l2bufferpool.c:1057:gst_v4l2_buffer_pool_start:nvv4l2decoder2:pool:src^[[00m Uncertain or not enough buffers, enabling copy threshold
0:00:39.002451730 ^[[332m24613^[[00m 0x7f66bc031720 ^[[33;01mWARN ^[[00m ^[[00m v4l2bufferpool gstv4l2bufferpool.c:1503:gst_v4l2_buffer_pool_dqbuf:nvv4l2decoder2:pool:src^[[00m Driver should never set v4l2_buffer.field to ANY
0:00:39.030751195 ^[[332m24613^[[00m 0x7f66c43a5940 ^[[33;01mWARN ^[[00m ^[[00m videodecoder gstvideodecoder.c:2780:gst_video_decoder_prepare_finish_frame:^[[00m decreasing timestamp (0:00:00.979895884 < 0:00: 01.110269363)
0:00:39.031009097 ^[[332m24613^[[00m 0x7f66bc031720 ^[[33;01mWARN ^[[00m ^[[00m videodecoder gstvideodecoder.c:2780:gst_video_decoder_prepare_finish_frame:^[[00m decreasing timestamp (0:00:01.024411837 < 0:00: 01.155458044)
0:00:39.031130660 ^[[332m24613^[[00m 0x7f66b40052d0 ^[[33;01mWARN ^[[00m ^[[00m videodecoder gstvideodecoder.c:2780:gst_video_decoder_prepare_finish_frame:^[[00m decreasing timestamp (0:00:01.685981909 < 0:00: 01.817943416)
0:00:39.064027918 ^[[332m24613^[[00m 0x1d7d770 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:5999:gst_rtspsrc_try_send:^[[00m send interrupted
0:00:39.064056799 ^[[332m24613^[[00m 0x1d7d770 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:8246:gst_rtspsrc_pause:^[[00m PAUSE interrupted
0:00:39.076437619 ^[[332m24613^[[00m 0x1d7d5e0 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:5999:gst_rtspsrc_try_send:^[[00m send interrupted
0:00:39.076466578 ^[[332m24613^[[00m 0x1d7d5e0 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:8246:gst_rtspsrc_pause:^[[00m PAUSE interrupted
0:00:39.077824218 ^[[332m24613^[[00m 0x1d7d800 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:5999:gst_rtspsrc_try_send:^[[00m send interrupted
0:00:39.077838633 ^[[332m24613^[[00m 0x1d7d800 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:8246:gst_rtspsrc_pause:^[[00m PAUSE interrupted
0:00:39.092703131 ^[[332m24613^[[00m 0x1d7d720 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:5999:gst_rtspsrc_try_send:^[[00m send interrupted
0:00:39.092721880 ^[[332m24613^[[00m 0x1d7d720 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:8246:gst_rtspsrc_pause:^[[00m PAUSE interrupted
0:00:39.094305180 ^[[332m24613^[[00m 0x1d7d850 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:5999:gst_rtspsrc_try_send:^[[00m send interrupted
0:00:39.094321468 ^[[332m24613^[[00m 0x1d7d850 ^[[33;01mWARN ^[[00m ^[[00m rtspsrc gstrtspsrc.c:8246:gst_rtspsrc_pause:^[[00m PAUSE interrupted

It may take some time for the pipeline to change the state from “PLAYING” to “NULL”.

Do you mean the process will never terminate after it finished? Does this only happen with RTSP sources?

Thanks @Fiona.Chen.

Yes, the process never exited until I killed it manually. I didn’t see this on processing video file. Processing video file is much more stable than processing RTSP source.

What python app is used for this test case? If it is your own app, can you send us your app or tell us how to reproduce the problem? How does the app finish? With “EOS” or just “crtl+C” to stop the app?

It’s my own app, it’s not stably reproduced, sometimes it happened, sometimes not. The code is shown as below. When the issue happened, there was no any “ctrl+c”, it’s supposed to exit naturally. The log has “Exiting app”, yet the process was not exited, obviously the last sentence “pipeline.set_state(Gst.State.NULL)” played the role.

I know the loop finished and then the app set the pipeline to NULL. My question is how does the loop finish? RTSP is a live stream. Either the stream goes to the end(e.g. the server provide a limited length stream such as an 1.5 hours movie, when the movie finishes, the client get EOS and then the loop finishes) or the client request to stop(e.g. using ctrl + C to finish the client app), the rtsp client will stop.

The loop is supposed to be running all the time. The RTSP stream server is running 24x7. We may use kill to terminate the application.

The issue I encountered is the pipeline ended while the process was still shown running. It’s not caused by EOS or ctrl+c, I think the issue has nothing to do with them.

The key is why loop finishes. As you said, if nothing happened, the loop will always run. But in your case, “Exiting app\n” print out means the loop finished. It is important to know why loop finished.

Well, there are 2 things here.

  1. Why did the pipeline end unexpectedly? This is unclear, the best guess is the RTSP source is not stable, the pipeline may not be able to get data or get partial data at some iteration. This is something beyond our control.

  2. How to handle the unexpected pipeline termination properly? This is part of error handling. I would expect the process exit instead of continue running without a working pipeline, which is likely misleading. Is this somewhere you can give any idea?

There is bus message callback which is provide by bus_call in bus_call.py
https://gstreamer.freedesktop.org/documentation/gstreamer/gstbus.html
Only “EOS” or “ERROR” will cause “loop quit”.

Back to RTSP, it is a client/server mode protocol. Client send requests to server and server response to client to provide the things clients need. RFC 7826: Real-Time Streaming Protocol Version 2.0. So what is your meaning of “RTSP source is not stable”? Even the network is broken, client will always wait the packages from server unless there is some other things happened(such as ctrl + C). It will not cause “loop quit”. So we want to know what kind of “unstable” it is? We can not reproduce the problem in our side, so more information and details are needed.

For how to exit. The “set_state(Gst.State.NULL)” is to finish the pipeline. You can try to add “pipeline.get_state()” to check whether the pipeline state is changed correctly. The usage is in the following link:
https://lazka.github.io/pgi-docs/#Gst-1.0/classes/Element.html#Gst.Element.get_state
https://gstreamer.freedesktop.org/documentation/additional/design/states.html

Thanks @Fiona.Chen for the detailed instructions.

I added some code and did more tests, here is the result.

  1. sometimes starting the application got below error from bus_call.py, in this scenario the application process just exited, which is good so that I knew something was wrong and did error handling.

===== error acquiring buffer flushing===
Error: gst-stream-error-quark: GStreamer encountered a general stream error. (1): gstdecodebin2.c(4695): gst_decode_bin_expose (): /GstPipeline:pipeline0/GstBin:source-bin-00/GstURIDecodeBin:uri-decode-bin/GstDecodeBin:decodebin0:
all streams without buffers

  1. sometimes I saw “End-of-stream”, the application also exited as expected, similar to scenario 1.

  2. I modified code as below. It turned out the get_state got Gst.StateChangeReturn.FAILURE. I saw “state change failed” in the log, and “Exiting app” as well. But the process was still running.

logger.info("Starting pipeline \n")
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass

pipeline.set_state(Gst.State.NULL)
state = pipeline.get_state(10)
if state == Gst.StateChangeReturn.SUCCESS:
    logger.info("state change succeeded")
elif state == Gst.StateChangeReturn.ASYNC:
    logger.info("pending for a state change\n")
else:
    logger.info("state change failed")
logger.info("Exiting app\n")

Above code is in main() function. It seems the sys.exit() didn’t exit the process actually, is there a way to forcefully terminate the process itself in code?

if __name__ == '__main__':
  sys.exit(main(sys.argv))

In this scenario, the pipeline never got a chance to process the streams. I saw logs as below.

2020-09-22 06:03:11,633 - INFO - Decodebin child added: decodebin2
2020-09-22 06:03:11,637 - INFO - Decodebin child added: rtph264depay2
2020-09-22 06:03:11,639 - INFO - Decodebin child added: h264parse1
2020-09-22 06:03:11,640 - INFO - Decodebin child added: capsfilter1 
2020-09-22 06:03:11,645 - INFO - Decodebin child added: nvv4l2decoder1
2020-09-22 06:03:11,760 - INFO - In cb_newpad

2020-09-22 06:03:34,312 - INFO - state change failed
2020-09-22 06:03:34,312 - INFO - Exiting app

We can not reproduce the problem.