GStreamer lockup with H.264 encoder from nvarguscamerasrc

Sorry by advance if you waste some time because of me. I’m just sharing my own understanding (which is not huge!) but I don’t have your HW so this is pure speculation. I’m just sharing what I would try in such case…so if you’re out of further thoughts you may try.

You may first try fakesink, then h264parse + udpsink localhost. Within same gst-launch process or not.

About omxh264enc vs nvv4l2h264enc, I sometimes feel something similar, but it depends on each case… nvv4l2 plugins may also achieve better performance in working cases… you would have to benchmark your case…until next L4T update ;-)

Hi,
Please share the steps so that we can try to reproduce the issue. And would like to get more information:
Does it happen only in multiple Argus camera sources? Or also present in single camera source?
Do you run single camera source in each process? Or multiple sources in single process?

Do you separate the sources with different udp port?

As of today (based on Honey_Patouceul’s recommended experiments post #6) I have found all I need to do is run six(6) instances of gst-launch-1.0 as such:

export GST_DEBUG="*:6"
export GST_DEBUG_NO_COLOR=1
gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

The change from my original post is that udpsink has been replaced with fakesink.

With nvv4l2h264enc and fakesink my latest run resulted in:

  • Camera 0 locked up in 13sec
  • Camera 3 locked up in 7m30s
  • Camera 2 locked up in 25m17s
  • Camera 1 locked up in 1h39m
  • Camera 4 locked up in 1h58m
  • Camera 5 is still running after 3h25m

I am not sure. I think it will lockup with just one gst-launch-1.0, especially if I use the nvv4l2h264enc encoder. See the time above on Camera 0 was only 13 seconds. I was either just starting gst-launch-1.0 command for Camera 1 or had already started it. If this is an important data point I’m sure I can run the experiment to try and get a lock up with only a single instance running at a time. It could take much longer to reproduce given the randomness of the issue and only having 1/6th the chance to hit the race condition that is causing it to lockup.

Update: Yes, it locks up with a single GStreamer camera source. After 10 hours and 55 minutes, I was able to get a lockup while only using a single camera source. See post #14 below for details.

It appears that using the udpsink was not necessary to reproduce the lockup. When I did run with udpsink I used the following port mapping:

Camera Port
0 5000
1 5002
2 5004
3 5006
4 5008
5 5010

Hi,
Please share how to check and know the source is locked up. Do you add additional prints? Or?

With nvv4l2h264enc I add GST_DEBUG="*:6" and with omxh264enc I add GST_DEBUG="omx:5". These will cause GStreamer to print out debug information with a timestamp as long as it is running. So when it stops executing you can look at the last operations it was performing and the timestamp since it began execution.

Additionally, you can also see the lockup in the CPU% utilization in htop. Under normal execution it will be >10%. When it locks up it will be 0-1% utilization.

It also happens with a single GStreamer camera source. After 10 hours and 55 minutes, I was able to get a lockup while only using a single camera source. I am not surprised it took longer to reproduce given the nature that I only had 1/6th the number of cameras being processed, encoded, and sent.

Here is the last frame captured on the Linux host desktop and the type of video being processed under low light conditions that the hardware encoder is being sourced, which seems to accelerate getting the lockup condition:


You can see there is lots of noise and the gain is high causing a lot of variation from one frame to the next, which I suspect is causing the hardware encoder to work harder than a static image under good lighting.

nvargus-daemon logged no errors and the last line of output (from 18 hours ago) is:

Sep 23 16:20:49 nvidia22 nvargus-daemon[6086]: === gst-launch-1.0[30790]: CameraProvider initialized (0x7f84a18510)CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!

You can see the parent pid 30790 and the child processes have all basically halted in htop:

Here is the only GStreamer command I used to cause the lockup:

GST_DEBUG="*:6" GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=192.168.55.100 port=6000 sync=false async=false

Here are the last 99,287 lines from the GStreamer output in the terminal:
gstreamer-camera0-lockup-issue.txt.gz

My low-level image sensor driver indicates that the image sensor is still streaming valid frames to the Jetson TX2 SoM CSI-2 port. Also, the VI tracing logs from debugfs indicate that the video frames are still being received properly by the NVIDIA VI pipeline. Here is a log of the VI tracing (for 10 seconds duration) long after the GStreamer pipeline has locked up:
camera-channel0-streaming-trace.txt

Finally, when GStreamer locks up the NVENC goes from the normal 1164MHz while running to the “OFF” state.

Seems it is now narrowed down to nvarguscamerasrc and nvv4l2h264enc.
Is it the same without specifying sensor-mode=0 ?
Does increasing bitrate helps ?

Just some more thoughts if you want to investigate, I understand it takes hours.

Looks less probable cause with only one camera now, but did you check the VI clocks as mentionned in my post #5 ?

Hi,
We will try to reproduce the issue and do investigation.

Yes, I have reproduced the issue with both fakesink (see post #10) and with udpsink to the localhost as such.

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5000 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5002 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5004 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5006 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5008 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5010 sync=false async=false

With nvv4l2h264enc and udpsink to the localhost:

  • Camera 5 locked up in 7m34s
  • Camera 1 locked up in 17m10s
  • Camera 3 locked up in 19m5s
  • Camera 4 locked up in 1h32m
  • Camera 0 locked up in 4h59m
  • Camera 2 locked up in 9h50m

No, h264parse does not help. I combined h264parse and fakesink as such:

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink

With h264parse added between nvv4l2h264enc and rtph264pay, and with fakesink:

  • Camera 5 locked up in 31s
  • Camera 1 locked up in 2m40s
  • Camera 4 locked up in 3m3s
  • Camera 2 locked up in 1h2m
  • Camera 3 locked up in 2h1m
  • Camera 0 locked up in 2h18m

Yes, when I launch the 6 pipelines from the same gst-launch-1.0 process the results are the same. I ran the following combined gst-launch-1.0 command:

GST_DEBUG="*:6" GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink

This was more difficult to determine when exactly each GStreamer pipeline had locked up. Looking at htop you can see the top six threads of gst-launch-1.0 with significant execution time accumulation are all at 0% CPU usage:


At this point the NVENC clock frequency has gone from 1164MHz to OFF.

This is the output from the gst-launch-1.0 debug output when GStreamer was locked up:
gstreamer-all-camera-stream-lockup.txt

Here is the camera VI tracing log from debugfs with ~10 seconds duration when GStreamer was locked up:
camera-all-channels-streaming-trace.txt

Finally, here is my little bash command to pull out the interesting information from the trace log:

$ cat camera-all-channels-streaming-trace.txt | cut -d' ' -f 13-14 | sort | uniq
tag:ATOMP_FE channel:0x00
tag:ATOMP_FE channel:0x01
tag:ATOMP_FE channel:0x02
tag:ATOMP_FE channel:0x03
tag:ATOMP_FE channel:0x04
tag:ATOMP_FE channel:0x05
tag:ATOMP_FS channel:0x00
tag:ATOMP_FS channel:0x01
tag:ATOMP_FS channel:0x02
tag:ATOMP_FS channel:0x03
tag:ATOMP_FS channel:0x04
tag:ATOMP_FS channel:0x05
tag:CHANSEL_LOAD_FRAMED channel:0x01
tag:CHANSEL_LOAD_FRAMED channel:0x04
tag:CHANSEL_LOAD_FRAMED channel:0x10
tag:CHANSEL_LOAD_FRAMED channel:0x41
tag:CHANSEL_LOAD_FRAMED channel:0x44
tag:CHANSEL_LOAD_FRAMED channel:0x50
tag:CHANSEL_PXL_EOF channel:0x00
tag:CHANSEL_PXL_EOF channel:0x01
tag:CHANSEL_PXL_EOF channel:0x02
tag:CHANSEL_PXL_EOF channel:0x03
tag:CHANSEL_PXL_EOF channel:0x04
tag:CHANSEL_PXL_EOF channel:0x05
tag:CHANSEL_PXL_SOF channel:0x00
tag:CHANSEL_PXL_SOF channel:0x01
tag:CHANSEL_PXL_SOF channel:0x02
tag:CHANSEL_PXL_SOF channel:0x03
tag:CHANSEL_PXL_SOF channel:0x04
tag:CHANSEL_PXL_SOF channel:0x05

So it looks like all six(6) channels are still receiving valid CSI frame data. My low-level hardware driver also indicates that CSI frames are still being sent to the NVIDIA CSI ports, even though GStreamer has locked up.

I do not have the exact lock up times of each of those individual GStreamer pipelines, but based on the final output from gstreamer-all-camera-stream-lockup.txt you can see the final lock up occurred at 16h43m. I remember seeing as it was running, those high utilization gst-launch-1.0 threads started to go to <1.0% CPU utilization. I interpret that to mean that the GStreamer pipeline was gradually locking up for each video stream one-by-one just like before. At the end the NVENC clock frequency was set to OFF when all the GStreamer debug output had halted and things were locked up.

I did two things. First, I ran the following commands to boost the camera VI/NVCSI/ISP clocks during the test:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
export GST_DEBUG="*:6"
export GST_DEBUG_NO_COLOR=1
gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

With boosted camera VI/NVCSI/ISP clocks:

  • Camera 5 locked up in 2m48s
  • Camera 3 locked up in 13m55s
  • Camera 4 locked up in 26m24s
  • Camera 1 locked up in 1h23m
  • Camera 2 locked up in 4h49m
  • Camera 0 locked up in 6h45m

Second, I also followed the Xavier link to change the VIC to the userspace governor with the maximum frequency:

echo on > /sys/devices/13e10000.host1x/15340000.vic/power/control
echo userspace > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/governor
echo 1024000000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/max_freq
echo 1024000000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/userspace/set_freq
export GST_DEBUG="*:6"
export GST_DEBUG_NO_COLOR=1
gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1’ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

With VIC userspace governor and maximum clock frequency of 1024000000 (1024 MHz):

  • Camera 4 locked up in 48s
  • Camera 5 locked up in 14m59s
  • Camera 0 locked up in 47m5s
  • Camera 2 locked up in 3h16m
  • Camera 3 locked up in 5h11m
  • Camera 1 locked up in 7h31m

My sensor mode 1 is also a 1080p 30FPS, but has PWL WDR enabled. I used the following command to exercise sensor mode 1 in the test:

export GST_DEBUG="*:6" export GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

With sensor mode 1:

  • Camera 2 locked up in 56s
  • Camera 4 locked up in 17m43s
  • Camera 3 locked up in 44m4s
  • Camera 5 locked up in 1h29m
  • Camera 1 locked up in 10h3m
  • Camera 0 locked up in 13h35m

No, I have now tested several bit rates including: 1000000, 4000000, and 8000000. All of them lock up.
I performed a single camera video stream test with 8000000 on two separate devices using the following command:

GST_DEBUG="*:6" GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=8000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink

With a bit rate of 8000000:

  • Device 2 locked up in 2h44m
  • Device 5 locked up in 19h30m

I’m afraid you’re wasting some time trying all my suggestions, sorry.
I have no TX2 running a recent release for trying, and with my NX I failed to reproduce with only one cam and teeing into 14 h264encoders+decoders+displaysinks with sync=false.
I’d rather suspect your cameras driver or argus, but this is pure speculation.
You may tell what are your sensors and format in mode 1.

So my only advice would just be to narrow it down to only:
nvarguscamerasrc ! fakesink or nvarguscamerasrc ! nvv4l2h264enc ! fakesink.

Also note that excessive gstreamer log level may result in very different timings than normal case.

Probably @DaneLLL will provide much better help if able to reproduce.

In hindsight that can be a conclusion. If one of those ideas would of yielded a stable and successful long running system, it would of been Eureka!

I hope with the additional tests and lockup information it can show that there is a real fundamental race condition in the NVIDIA provided GStreamer plugins or the hardware pieces that GStreamer relies on. The testing lock up times seem to indicate a race condition or improper usage of a shared resource between the GStreamer pipelines. I am hypothesizing that based on the trend of early failures in the GStreamer pipeline when all six video streams are running. Then as the GStreamer pipelines begin to deadlock and stop processing the data properly the remaining streams tend to get longer and longer run times. So the failures seem to be correlated to more usage of the underlying resources. However, even with a single GStreamer pipeline the lock up issue can still be reproduced. This indicates that the problem is still there and exposed with only one running at a time, just much less frequently.

A couple of questions before we conclude you have failed to reproduce the issue:

  1. How many camera sources did you test with?
  2. How long did you run your test?
  3. Which version of L4T are you using?
  4. Which power profile are you running on and have you enabled the Denver cores?
  5. What is the exact GStreamer command you are executing?
  6. Did you try with a dark room to ensure the image sensor was generating random static data needing to be encoded in every frame?

I can certainly understand that. However, I have seen it lock up even with no debugging output. I didn’t even know there was debug logging in GStreamer till I started trying to investigate the lockup. The debug output is a mechanism to easily see when one of the GStreamer pipelines are locked up. It also allows a precise time/duration for how long it runs before locking up. Without the logging it will be difficult to know that they are locked up, especially with the fakesink.

Well, consider I’m just a hobbyist, curious and trying to help.
As said before, I don’t have a 6 camera setup. My only TX2 can just run R28.2, so I can’t help about TX2 specific issue.
I’ve just tried to reproduce with a Xavier NX and one RPi cam, teeing into 14 pipelines each doing h264 encoding+decoding + display. I have run:

gst-launch-1.0 -e nvarguscamerasrc sensor-mode=2 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! tee name=video ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false   video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false       video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false      video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false

for almost one hour and seen nothing wrong. Run in dark room, but below a display so some light though. Tried adding a lot of motion but this didn’t change.

Did you try adding sync = false to your sinks ?

I don’t have so much time for trying much more, sorry. You should expect more from NVIDIA support.
Again, it would help for advising to know about your sensor and modes.

Hi,
We can observe the issue. Will update once there is new finding.

1 Like

Wow! That GStreamer pipeline is amazing. It launched 163 threads to do that pipeline on my system. However, I’m guessing there is only a single copy of the nvarguscamerasrc data to each of those tees that does the encode. So while this generates a lot of system activity, it may not be generating the right type of activity to cause the lockup. I will give this giant GStreamer pipeline a test also, but I would expect it could take 10+ hours to lockup just like my single instance testing.

Unfortunately, I don’t think a test on the old R28.2 release for only one hour with a single camera tells much. With a single camera you would need to run at least 48 hours to know if you have a race condition that results in a lockup. If you have more nvarguscamerasrc cameras that can be run simultaneously, you can shorten that time a bit.

Yes, I added sync=false when I did the localhost test as such:
gst-launch-1.0 -e nvarguscamerasrc sensor-id=$camera sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=$port sync=false async=false
This still had all six of the GStreamer instances lockup.

Yes, I think that with a single camera being replicated by SW I was unable to reproduce it, so probably this is related to something else. @DaneLLL told this was reproduced on NVIDIA side, so this is your main hope now.
For clarity, I haven’t tried with my old TX2, only with a Xavier NX.

I know this is a tough problem and takes a bit of time to reproduce and test changes, but I’m wondering if you have any update on the progress or estimated time for a solution?

Is there any additional information you need about my test setup or hardware?

Are there any experiments that you need me to run to collect more data or try to narrow down the problem?

Hi,
We are still checking it. Will update once there is further progress. Thanks.

Hi,
We should have it fixed in r32.5. Please give it a try.

Can you tell me the Bug Id/Issue # so I can review the source change and see the commit?