Reconnection Issue

• Hardware Platform (Jetson / GPU) NVIDIA A2
• DeepStream Version 6.3
• TensorRT Version 8.4.0
• NVIDIA GPU Driver Version (valid for GPU only) 535.129.03

Hello,

I am using a deepstream pipeline in Python and inferring it with nvinfersever and have an issue that I tried to reproduce on deepstream_test3.

My pipeline serves 20+ RTSP streams, but I find that some streams stop suddenly. The sad part is that it doesn’t reconnect again. However, it works fine when I do the same thing with fewer cameras!

I’ll attach the code, screenshots, the commands I used to build RTSP server, and maybe the configs on PeopleNet I use.

In this SS, you can see that stream0 stopped suddenly causing 0 fps and it never reconnected again. Although I am using rtsp-reconnect-interval.

Here’s my code:
deepstream_test_3.txt (18.6 KB)

About the RTSP server, I use the simple server example as mentioned here.

docker run --rm -it --network=host aler9/rtsp-simple-server

After that I run ffmpeg commands on different terminals. i.e:
ffmpeg -re -stream_loop -1 -i cam5.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://10.1.118.105:8554/stream5 -re -stream_loop -1 -i cam6.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://10.1.118.105:8554/stream6 -re -stream_loop -1 -i cam7.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://10.1.118.105:8554/stream7 -re -stream_loop -1 -i cam1.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://10.1.118.105:8554/stream8 -re -stream_loop -1 -i cam2.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://10.1.118.105:8554/stream9

PeopleNet config:
config.txt (1.3 KB)
config_triton_infer_primary_peoplenet.txt (1.2 KB)

About the videos I am using, I use wildtrack dataset.
Note: It works very fine when I use them as mp4 files.

I appreciate your help, thanks.

please refer to this topic. you can modify batched-push-timeout to improve.
please refer to this FAQ for how to set parameters reasonably to improve the efficiency of nvstreammux in live mode.

I encountered a similar RTSP stream re-connection issue when using nvurisrcbin with deepstream python test3.

Despite setting the suggested parameters, when processing multiple RTSP streams, one of the streams disconnects randomly and never reconnects forcing you to restart the whole pipeline and disturb the processing of the other streams.

Due to this reconnection unreliability with nvurisrcbin, I’m currently trying a different implementation using uridecodebin with deepstream_rt_src_add_del to reconnect a disrupted stream.

Hi @user87838

So, do you think it’s better to use uridecodebin with add/delete src feature? And did you mange to do it on deepstream_test3.py?

I’m still in the process of implementing and testing it with deepstream_test3.py. So that’s why at this moment in time, I can’t say with 100% certainty if this solution is better.

However, what I can say based on my testing/observations is that currently I can’t rely on using nvurisrcbin stream reconnection feature when processing many RTSP streams.

@user87838
May you keep me updated?

I’ll try to keep investigating and update this topic with the results and share codes.

Hello @fanzh

I read both the topic and FAQ before, and I read them one more time when you provided them. But I really can’t find a solution to my issue.

Would you please try that case or reproduce it and help me ~us~ with a solution? This issue breaks production and causes a lot of issues.

Thank you for your cooperation.

in this topic , I tested two RTSP sources. the reconnecting functionality worked fine. please refer to the log.
@JoeShz to narrow down this issue, could you use the following method to get some logs?

1.  set env variable. export GST_DEBUG=3,nvurisrcbin:6
2.  ./app ...... >1.log 2>1.log

Yes, I would work fine with two rtsp .

That’s why I mentioned that I am using 20+ rtsp sources.

@fanzh

I am sorry but I don’t understand this line: ./app ...... >1.log 2>1.log where should I use it?

sorry, I mean, run your application, and redirect the log to the file 1.log.

@JoeShz Sure I will keep you updated.

You can see in the same topic that the reconnecting functionality failed for 6 public RTSP streams (each of 15 framerate) and also for 2 local RTSP streams (of 20 and 25 framerate). I have shared the logs.

Have you tried testing with more than 2 RTSP streams? And with streams having different framerate? And also left the streams processing in the pipeline for at least 24 hours?

thanks for the sharing! let’s focus this “also for 2 local RTSP streams (of 20 and 25 framerate)” issue first.
I tested two RTSP sources. one is 15fps, the other is 30fps. the reconnection functionality worked well, please refer to log 30fps-15fps.txt (120.2 KB).
from your log two-local-rtsp-streams-reconnect.txt, the fps still is 0 after Resetting source. we can’t rule out the network issue because there is no related logs. to narrow down this issue, could you provide more logs by setting env variable. export GST_DEBUG=3,rtpjitterbuffer:6 first?

I redirected the logs into a file as you asked. I used 32 cameras but unfortunately, not everything was in the file. I’ll attach a screenshot too.

logs:
1.txt (65.3 KB)

These errors occurred after using export GST_DEBUG=3,nvurisrcbin:6,rtpjitterbuffer:6

I stopped the rtsp streams when I found a stream stopped suddenly. I don’t think I have to export the whole logs.

  1. are these sources physical camera or virtual rtsp server? how did you imitate disconnecting source? turn off camera? unplug the network cable?
  2. in the log 1.txt, the max fps is 3.8, is it the actual fps?
  3. in the log 1.txt, there is no rtpjitterbuffer logs, which represents receiving status. please redirect the log by " command-line >1.log 2>1.log". for example,
0:00:07.608818879 e[332m 3301e[00m 0x7fd7080038c0 e[37mDEBUG  e[00m e[00m     rtpjitterbuffer gstrtpjitterbuffer.c:2983:gst_rtp_jitter_buffer_chain:<rtpjitterbuffer0>e[00m Received packet #31258 at time 0:00:03.482960395, discont 0, rtx 0
  1. noticing the two RTSP sources also has reconnection issue in your test. to smiplify the issue, could you test with only two sources? Thanks!

could you provide more logs by setting env variable. export GST_DEBUG=3,rtpjitterbuffer:6 first?

Sure let me do that and share logs with you. Thank you for your response.

we can’t rule out the network issue because there is no related logs.

During the pipeline’s reconnection attempts, I checked the streams in vlc media player and they were playing properly at that time. Which would not have been possible if a network issue was present.

  1. Virtual RTSP server. Just stopped the streamer server.
  2. This is true, the cameras have a lot of objects. The detector takes too much time to detect these objects I guess. Although I am using PeopleNet.
  3. Ok. I’ll do that and share logs one more time.
  4. This issue will never appear with only two sources. That’s what I’ve been saying for a while.

I tried to redirect rtpjitterbuffer logs but the file is huge. I can’t upload it.

Okay, long story short.

It seems like @user87838 is right. Whenever I use rtsp-reconnect-interval with nvurisrcbin, some steams suddenly disconnect. Seems like it isn’t functional.

And of course, when I disable this property, and I disconnect some sources, the whole pipeline hangs and this is expected.

So, we need a workaround without using nvurisrcbin or is it fixed in DS 6.4 or something?

@fanzh

This is really urgent. I appreciate your help.