DeepStream Python gets stuck with RTSP stream

Hey,
We are working on an automated pipeline with Deepstream-python.
The pipeline code is set to auto restart itself with supervisorctl.

Source - RTSP stream from a camera
Inference models - 2
sink - a udp sink to further access the stream via rtsp

But surprisingly, the stream gets stuck with no errors or no logs after a run of few hours. Currently, I am completely clueless about what’s happening.

Can somebody help in the resolution or debugging of this issue?

Also, due to Indian ambient temperatures, the jetson is running at quite extreme temperatures. Here is the screenshot of the JTOP. Can the temperature be a culprit?

Thanks!

Hi,
Please share the information:
• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• Issue Type( questions, new requirements, bugs)

And do you run on developer kit or your custom board? It should not go to 100C if the thermal solution is working as expected. If you use Jetson Nano developer kit, please try to add a USB fan. See if this improves the stability.

• Hardware Platform (Jetson / GPU) - Jetson Nano
• DeepStream Version - 5.1
• JetPack Version (valid for Jetson only) - 4.5.1
• Issue Type( questions, new requirements, bugs) - Bugs

Its a production module installed on 2gb developer kit, with the fan installed.

Hi,
So it is Nano 4GB emmc module + Nano 2GB developer kit? Have not seen this combination. Would like to confirm it. Is there a reason to have the hardware combination?

Hey,
We tried the combination and it seemed to work fine. Can this be the issue? Or anything else that you would like to recommend?

Hi,
The developer kits of Jetson Nano 2GB ad 4GB are different. Although the deviation is minor, it is not 100% pin to pin compatible. Certain functions may not work if you insert Jetson Nano 4GB to developer kit of 2GB. Are you able to try Jetson Nano 4GB module + 4GB developer kit?

Please check documents about the developer kits:
Jetson Nano 2GB Developer Kit User Guide | NVIDIA Developer
https://developer.nvidia.com/embedded/dlc/Jetson_Nano_Developer_Kit_User_Guide
https://developer.nvidia.com/jetson-nano-developer-kit-carrier-board-p3449-b01-specification

Hey,

I am facing the same issue again. Let me try to rephrase it.

  1. I am using an IP camera stream through RTSP. Using uridecodebin for playing the stream.
  2. I have set the live-source to 1 in streammux property.
  3. There are two models connected, primary and secondary.
  4. The stream works fine when sometimes (usually with no or less inferencing). This is a production environment, so usually from night to afternoon. We tried running and observing it for up to 11 hours.
  5. But during loads, when there is a lot of activity and inferencing , the deep stream code is stuck or freezes indefinitely.
  6. The code is not killed, there are no errors recorded in GST-Bus, it just gets stuck after every 30-40 mins.
  7. We checked using htop, all the threads of the code are alive just not using the CPU. Rest in Jtop everything seems normal ram, temp, etc.
  8. The code has been taken from python deepstream examples - GitHub - NVIDIA-AI-IOT/deepstream_python_apps: A project demonstrating use of Python for DeepStream sa

I doubt this is a hardware issue, as we have tried this on as many as 15 devices, but all of them show something similar.

I have seen some similar threads but not any satisfactory response

It would be great if @DaneLLL can help us with the same?

Hi,
Do you set sync=0 to sink? It not, please set the property and give it a try. Not sure but probably the synchronization mechanism in gstreamer frameworks triggers the issue.

Yes, I have already set sync property to 0
sink.set_property("sync", 0)

Hey, any other options that I can try with?

Recently I tried playing two video sources with the same code, and I was able to reproduce the issue of getting stuck.

Although it has never happened with one video stream.

Thanks

Hi,
We have default model in

/opt/nvidia/deepstream/deepstream-5.1/samples/models/Primary_Detector_Nano

Please try the model and check if you observe the issue. Would like to know if it is specific to certain model. Probably for certain heavy loading model Jetson Nano is not able to achieve target performance.

Hey, we tried running the model with JUST PGIE, and it ran properly.

My current pipeline is - n(streams) * uridecodebin -> streammux -> pgie -> sgie -> fakesink

Removing SGIE helped, but we need SGIE. Can you suggest a link to Jetson nano-specific standard configs for SGIE, PGIE, and Streammux?

Additionally how do we debug this issue?

@DaneLLL
Hey, printed some logs, let me know if you find them helpful.

These are printed in last and the pipeline gets stuck after this.

Kindly look into this, really urgent.

0:01:23.410544079 13229   0x7ef8024540 DEBUG         GST_SCHEDULING gstpad.c:4326:gst_pad_chain_data_unchecked:<primary-inference:sink> called chainfunction &gst_base_transform_chain with buffer 0x7f0800c4e0, returned ok
0:01:23.410641267 13229   0x7ef8024540 DEBUG             bufferpool gstbufferpool.c:304:do_alloc_buffer:<nvstreammuxbufferpool0> max buffers reached
0:01:23.410717569 13229   0x7ef8024540 DEBUG               GST_POLL gstpoll.c:1317:gst_poll_wait: 0x7ef8024630: timeout :99:99:99.999999999
0:01:23.411975231 13229   0x7efc02f930 DEBUG             GST_BUFFER gstbuffer.c:1375:gst_buffer_is_memory_range_writable: idx 0, length -1
0:01:23.412453306 13229     0x1708f0f0 DEBUG             bufferpool gstbufferpool.c:304:do_alloc_buffer:<bufferpool0> max buffers reached
0:01:23.412499244 13229     0x1708f0f0 DEBUG               GST_POLL gstpoll.c:1317:gst_poll_wait: 0x1708e370: timeout :99:99:99.999999999
0:01:23.426789883 13229   0x7efc02f930 DEBUG             GST_BUFFER gstbuffer.c:1375:gst_buffer_is_memory_range_writable: idx 0, length -1
0:01:23.427197229 13229     0x1708f0f0 DEBUG             bufferpool gstbufferpool.c:304:do_alloc_buffer:<bufferpool0> max buffers reached
0:01:23.427253844 13229     0x1708f0f0 DEBUG               GST_POLL gstpoll.c:1317:gst_poll_wait: 0x1708e370: timeout :99:99:99.999999999
0:01:23.441130991 13229   0x7efc02f930 DEBUG             GST_BUFFER gstbuffer.c:1375:gst_buffer_is_memory_range_writable: idx 0, length -1
0:01:23.482430246 13229   0x7efc02f930 DEBUG             GST_BUFFER gstbuffer.c:1375:gst_buffer_is_memory_range_writable: idx 0, length -1
0:01:23.483014675 13229     0x1708f0f0 DEBUG             bufferpool gstbufferpool.c:304:do_alloc_buffer:<bufferpool0> max buffers reached
0:01:23.483083738 13229     0x1708f0f0 DEBUG               GST_POLL gstpoll.c:1317:gst_poll_wait: 0x1708e370: timeout :99:99:99.999999999
0:01:23.494594052 13229   0x7efc02f930 DEBUG             GST_BUFFER gstbuffer.c:1375:gst_buffer_is_memory_range_writable: idx 0, length -1
0:01:23.495120409 13229     0x1708f0f0 DEBUG             bufferpool gstbufferpool.c:304:do_alloc_buffer:<bufferpool0> max buffers reached
0:01:23.495205201 13229     0x1708f0f0 DEBUG               GST_POLL gstpoll.c:1317:gst_poll_wait: 0x1708e370: timeout :99:99:99.999999999

Hi,
For Jetson Nano we demonstrate only PGIE since GPU capability is limited. If you require SGIE, may check if you can reduce loading of the model and set larger batch-size. Generally SGIE is lighter than PGIE.

The nvinfer plugin is open source. You may add debug prints to rebuild the plugins to get more information.

For further check we would need to reproduce the issue. Please check if you can share a gst-launch-1.0 command so that we can set up Jetson Nano 4GB and try.