GStreamer lockup with H.264 encoder from nvarguscamerasrc

I am testing the L4T R32.4.3 production release with GStreamer and I am encountering a lockup where it looks like it is stuck in the h264 encoder part of the pipeline.

Here is the GStreamer command being used:

GST_DEBUG="omx:5" gst-launch-1.0 nvarguscamerasrc sensor-id=$camera sensor-mode=0 ! 'video/x-raw(memory:NVMM),format=(string)NV12,width=1920,height=1080,framerate=30/1' ! omxh264enc bitrate=1000000 control-rate=2 insert-sps-pps=true iframeinterval=30 profile=1 ! rtph264pay ! udpsink host=$ip port=$port

Note I run six(6) instances of the above command where $camera=[0-5] and the $port is different for each camera udpsink. The cameras are all identical 1080p 30FPS image sensors.

I have done a bit of debugging already trying to find where GStreamer is stuck and what the exact cause is and how to workaround or fix the issue. I have tried different bit rates and even changed to nvv4l2h264enc and I still get lock ups. So I am not sure if the issue is with the nvarguscamerasrc or in the NV Hardware Encoder. One time when I saw that it was locked up NVENC was stuck at 1062MHz and was not changing. Once I killed the stuck instance of GStreamer the NVENC clock rate started updating as normal.

I have thoroughly tested using all six(6) cameras at once using argus_camera in Multi Session mode and never had an issue. Just to make sure my cameras were still streaming I checked the VI tracing log from debugfs and each of my six camera channels were still getting generating events; rtcpu_vinotify_event ATOMP_FE, CHANSEL_PXL_SOF, and CHANSEL_PXL_EOF at 30FPS in the trace log.

Additionally, when GStreamer locks up and I have the GST_DEBUG=ā€œomx:5ā€ debugging level enabled, I always see this line:

1:21:21.136081856  7768   0x7f88002a30 DEBUG                    omx gstomx.c:1375:gst_omx_port_acquire_buffer:<omxh264enc-omxh264enc0> Queue of encoder port 1 is empty

as the final line output from the instance of GStreamer that has locked up. If I hit CTRL-C to kill the GStreamer execution I get additional omx DEBUG and INFO messages (see attached log camera0-failure.txt (13.8 KB) )

Another interesting observation is that when one of the GStreamer gst-launch-1.0 instances locks up, it does not affect the others. Normally, the gst-launch-1.0 process will consume 8-10% of the CPU. However, the locked up gst-launch-1.0 will vary between 0-1.0% CPU usage implying it is deadlocked on some sort of resource waiting for an event.

This problem is difficult to reproduce. I have seen it lock up anywhere from 30 minutes of executing to 13 hours of execution. However, there are times where I have seen all six cameras streams execute without any issue for >24 hours. This appeared to be a random chance occurrence, as after power cycling and running the exact same GStreamer pipeline as previously, it resulted in another GStreamer instance locking up in a few hours.

Finally, I started seeing this issue with low lighting conditions (i.e., overnight when the lights are off or very low). This seems to have the effect of amplifying the noise in the image sensor video with lots of static in the image changing rapidly from frame to frame. This makes the System Monitor Network send throughput max out at 100 Mb/s. Under normal lighting conditions I see low 1.5-20 Mb/s send bandwidths. This too made me think that the NV Hardware Encoder was working hard when the image sensor video was noisy and might be the reason the lock ups are occurring.

1 Like

Hi,
We are deprecating omx plugins. Please try v4l2 plugin nvv4l2h264enc.

The deprecation is listed in release notes:
5.12 GStreamer Plugin gst-omx Deprecated

It locks up with nvv4l2h264enc also. With the following GStreamer command:

gst-launch-1.0 -e nvarguscamerasrc sensor-id=$camera sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=$ip port=$port sync=false async=false

each of the six camera lock up even faster than with omxh264enc.

During my last test run with nvv4l2h264enc:

  • Camera 5 locked up in 4m16s
  • Camera 0 locked up in 5m30s
  • Camera 1 locked up in 10m49s
  • Camera 4 locked up in 1h36m
  • Camera 3 locked up in 2h53m
  • Camera 2 locked up in 3h25m

So regardless of using omxh264enc or nvv4l2h264enc GStreamer locks up while trying to do H.264 encode in the pipeline.

I am also having reliability issues with nvv4l2h264enc. Transitioning from omxh264enc to nvv4l2h264enc is resulting in less stability. Please help debug this issue.

This is not an answer, and I may not be able to help further, but for helping to figure it out:

  • Is it the same if you replace udpsink by fakesink ? Or set udpsink host to localhost ?
  • Does adding h264parse between h264 encoder and rtph264pay improves ?
  • Is it the same if you launch the 6 pipelines from the same gst-launch process or 6 gst-launch processes each running its own camera pipeline ?

You may also check VIC clocks. This link was for a Xavier, there may be some different paths for TX2.

I will try each of these suggestions and provide the results. I am hoping that the additional information will help to narrow the problem. It will take me a while to provide the results since the issue takes a bit of time to reproduce. Would you suggest I start with checking these items with omxh264enc or nvv4l2h264enc first?

NVIDIA seems to be pushing to move away from the omx plugins. However, the alternative nvv4l2h264enc encoder seems to not be ready for production use as it is especially prone to crashes and is unreliable as others have pointed out on the forums and which is true in my experience. I would be willing to give up a little performance and use nvv4l2h264enc. However, I cannot give up reliability/robustness to move to that encoder plugin. Do you know if there is a way to make the nvv4l2h264enc encoder be as reliable as the omxh264enc encoder with a change in the GStreamer pipeline?

Sorry by advance if you waste some time because of me. Iā€™m just sharing my own understanding (which is not huge!) but I donā€™t have your HW so this is pure speculation. Iā€™m just sharing what I would try in such caseā€¦so if youā€™re out of further thoughts you may try.

You may first try fakesink, then h264parse + udpsink localhost. Within same gst-launch process or not.

About omxh264enc vs nvv4l2h264enc, I sometimes feel something similar, but it depends on each caseā€¦ nvv4l2 plugins may also achieve better performance in working casesā€¦ you would have to benchmark your caseā€¦until next L4T update ;-)

Hi,
Please share the steps so that we can try to reproduce the issue. And would like to get more information:
Does it happen only in multiple Argus camera sources? Or also present in single camera source?
Do you run single camera source in each process? Or multiple sources in single process?

Do you separate the sources with different udp port?

As of today (based on Honey_Patouceulā€™s recommended experiments post #6) I have found all I need to do is run six(6) instances of gst-launch-1.0 as such:

export GST_DEBUG="*:6"
export GST_DEBUG_NO_COLOR=1
gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

The change from my original post is that udpsink has been replaced with fakesink.

With nvv4l2h264enc and fakesink my latest run resulted in:

  • Camera 0 locked up in 13sec
  • Camera 3 locked up in 7m30s
  • Camera 2 locked up in 25m17s
  • Camera 1 locked up in 1h39m
  • Camera 4 locked up in 1h58m
  • Camera 5 is still running after 3h25m

I am not sure. I think it will lockup with just one gst-launch-1.0, especially if I use the nvv4l2h264enc encoder. See the time above on Camera 0 was only 13 seconds. I was either just starting gst-launch-1.0 command for Camera 1 or had already started it. If this is an important data point Iā€™m sure I can run the experiment to try and get a lock up with only a single instance running at a time. It could take much longer to reproduce given the randomness of the issue and only having 1/6th the chance to hit the race condition that is causing it to lockup.

Update: Yes, it locks up with a single GStreamer camera source. After 10 hours and 55 minutes, I was able to get a lockup while only using a single camera source. See post #14 below for details.

It appears that using the udpsink was not necessary to reproduce the lockup. When I did run with udpsink I used the following port mapping:

Camera Port
0 5000
1 5002
2 5004
3 5006
4 5008
5 5010

Hi,
Please share how to check and know the source is locked up. Do you add additional prints? Or?

With nvv4l2h264enc I add GST_DEBUG="*:6" and with omxh264enc I add GST_DEBUG="omx:5". These will cause GStreamer to print out debug information with a timestamp as long as it is running. So when it stops executing you can look at the last operations it was performing and the timestamp since it began execution.

Additionally, you can also see the lockup in the CPU% utilization in htop. Under normal execution it will be >10%. When it locks up it will be 0-1% utilization.

It also happens with a single GStreamer camera source. After 10 hours and 55 minutes, I was able to get a lockup while only using a single camera source. I am not surprised it took longer to reproduce given the nature that I only had 1/6th the number of cameras being processed, encoded, and sent.

Here is the last frame captured on the Linux host desktop and the type of video being processed under low light conditions that the hardware encoder is being sourced, which seems to accelerate getting the lockup condition:


You can see there is lots of noise and the gain is high causing a lot of variation from one frame to the next, which I suspect is causing the hardware encoder to work harder than a static image under good lighting.

nvargus-daemon logged no errors and the last line of output (from 18 hours ago) is:

Sep 23 16:20:49 nvidia22 nvargus-daemon[6086]: === gst-launch-1.0[30790]: CameraProvider initialized (0x7f84a18510)CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!

You can see the parent pid 30790 and the child processes have all basically halted in htop:

Here is the only GStreamer command I used to cause the lockup:

GST_DEBUG="*:6" GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=192.168.55.100 port=6000 sync=false async=false

Here are the last 99,287 lines from the GStreamer output in the terminal:
gstreamer-camera0-lockup-issue.txt.gz

My low-level image sensor driver indicates that the image sensor is still streaming valid frames to the Jetson TX2 SoM CSI-2 port. Also, the VI tracing logs from debugfs indicate that the video frames are still being received properly by the NVIDIA VI pipeline. Here is a log of the VI tracing (for 10 seconds duration) long after the GStreamer pipeline has locked up:
camera-channel0-streaming-trace.txt

Finally, when GStreamer locks up the NVENC goes from the normal 1164MHz while running to the ā€œOFFā€ state.

Seems it is now narrowed down to nvarguscamerasrc and nvv4l2h264enc.
Is it the same without specifying sensor-mode=0 ?
Does increasing bitrate helps ?

Just some more thoughts if you want to investigate, I understand it takes hours.

Looks less probable cause with only one camera now, but did you check the VI clocks as mentionned in my post #5 ?

Hi,
We will try to reproduce the issue and do investigation.

Yes, I have reproduced the issue with both fakesink (see post #10) and with udpsink to the localhost as such.

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5000 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5002 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5004 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5006 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5008 sync=false async=false
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! udpsink host=localhost port=5010 sync=false async=false

With nvv4l2h264enc and udpsink to the localhost:

  • Camera 5 locked up in 7m34s
  • Camera 1 locked up in 17m10s
  • Camera 3 locked up in 19m5s
  • Camera 4 locked up in 1h32m
  • Camera 0 locked up in 4h59m
  • Camera 2 locked up in 9h50m

No, h264parse does not help. I combined h264parse and fakesink as such:

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink

With h264parse added between nvv4l2h264enc and rtph264pay, and with fakesink:

  • Camera 5 locked up in 31s
  • Camera 1 locked up in 2m40s
  • Camera 4 locked up in 3m3s
  • Camera 2 locked up in 1h2m
  • Camera 3 locked up in 2h1m
  • Camera 0 locked up in 2h18m

Yes, when I launch the 6 pipelines from the same gst-launch-1.0 process the results are the same. I ran the following combined gst-launch-1.0 command:

GST_DEBUG="*:6" GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=2 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=3 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=4 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink nvarguscamerasrc sensor-id=5 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink

This was more difficult to determine when exactly each GStreamer pipeline had locked up. Looking at htop you can see the top six threads of gst-launch-1.0 with significant execution time accumulation are all at 0% CPU usage:


At this point the NVENC clock frequency has gone from 1164MHz to OFF.

This is the output from the gst-launch-1.0 debug output when GStreamer was locked up:
gstreamer-all-camera-stream-lockup.txt

Here is the camera VI tracing log from debugfs with ~10 seconds duration when GStreamer was locked up:
camera-all-channels-streaming-trace.txt

Finally, here is my little bash command to pull out the interesting information from the trace log:

$ cat camera-all-channels-streaming-trace.txt | cut -d' ' -f 13-14 | sort | uniq
tag:ATOMP_FE channel:0x00
tag:ATOMP_FE channel:0x01
tag:ATOMP_FE channel:0x02
tag:ATOMP_FE channel:0x03
tag:ATOMP_FE channel:0x04
tag:ATOMP_FE channel:0x05
tag:ATOMP_FS channel:0x00
tag:ATOMP_FS channel:0x01
tag:ATOMP_FS channel:0x02
tag:ATOMP_FS channel:0x03
tag:ATOMP_FS channel:0x04
tag:ATOMP_FS channel:0x05
tag:CHANSEL_LOAD_FRAMED channel:0x01
tag:CHANSEL_LOAD_FRAMED channel:0x04
tag:CHANSEL_LOAD_FRAMED channel:0x10
tag:CHANSEL_LOAD_FRAMED channel:0x41
tag:CHANSEL_LOAD_FRAMED channel:0x44
tag:CHANSEL_LOAD_FRAMED channel:0x50
tag:CHANSEL_PXL_EOF channel:0x00
tag:CHANSEL_PXL_EOF channel:0x01
tag:CHANSEL_PXL_EOF channel:0x02
tag:CHANSEL_PXL_EOF channel:0x03
tag:CHANSEL_PXL_EOF channel:0x04
tag:CHANSEL_PXL_EOF channel:0x05
tag:CHANSEL_PXL_SOF channel:0x00
tag:CHANSEL_PXL_SOF channel:0x01
tag:CHANSEL_PXL_SOF channel:0x02
tag:CHANSEL_PXL_SOF channel:0x03
tag:CHANSEL_PXL_SOF channel:0x04
tag:CHANSEL_PXL_SOF channel:0x05

So it looks like all six(6) channels are still receiving valid CSI frame data. My low-level hardware driver also indicates that CSI frames are still being sent to the NVIDIA CSI ports, even though GStreamer has locked up.

I do not have the exact lock up times of each of those individual GStreamer pipelines, but based on the final output from gstreamer-all-camera-stream-lockup.txt you can see the final lock up occurred at 16h43m. I remember seeing as it was running, those high utilization gst-launch-1.0 threads started to go to <1.0% CPU utilization. I interpret that to mean that the GStreamer pipeline was gradually locking up for each video stream one-by-one just like before. At the end the NVENC clock frequency was set to OFF when all the GStreamer debug output had halted and things were locked up.

I did two things. First, I ran the following commands to boost the camera VI/NVCSI/ISP clocks during the test:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
export GST_DEBUG=ā€œ*:6ā€
export GST_DEBUG_NO_COLOR=1
gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

With boosted camera VI/NVCSI/ISP clocks:

  • Camera 5 locked up in 2m48s
  • Camera 3 locked up in 13m55s
  • Camera 4 locked up in 26m24s
  • Camera 1 locked up in 1h23m
  • Camera 2 locked up in 4h49m
  • Camera 0 locked up in 6h45m

Second, I also followed the Xavier link to change the VIC to the userspace governor with the maximum frequency:

echo on > /sys/devices/13e10000.host1x/15340000.vic/power/control
echo userspace > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/governor
echo 1024000000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/max_freq
echo 1024000000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/userspace/set_freq
export GST_DEBUG=ā€œ*:6ā€
export GST_DEBUG_NO_COLOR=1
gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink
gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=0 ! ā€˜video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1ā€™ ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

With VIC userspace governor and maximum clock frequency of 1024000000 (1024 MHz):

  • Camera 4 locked up in 48s
  • Camera 5 locked up in 14m59s
  • Camera 0 locked up in 47m5s
  • Camera 2 locked up in 3h16m
  • Camera 3 locked up in 5h11m
  • Camera 1 locked up in 7h31m

My sensor mode 1 is also a 1080p 30FPS, but has PWL WDR enabled. I used the following command to exercise sensor mode 1 in the test:

export GST_DEBUG="*:6" export GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=1 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=2 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=3 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=4 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink gst-launch-1.0 -e nvarguscamerasrc sensor-id=5 sensor-mode=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! fakesink

With sensor mode 1:

  • Camera 2 locked up in 56s
  • Camera 4 locked up in 17m43s
  • Camera 3 locked up in 44m4s
  • Camera 5 locked up in 1h29m
  • Camera 1 locked up in 10h3m
  • Camera 0 locked up in 13h35m

No, I have now tested several bit rates including: 1000000, 4000000, and 8000000. All of them lock up.
I performed a single camera video stream test with 8000000 on two separate devices using the following command:

GST_DEBUG="*:6" GST_DEBUG_NO_COLOR=1 gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=8000000 insert-sps-pps=true ! h264parse ! rtph264pay mtu=1400 ! fakesink

With a bit rate of 8000000:

  • Device 2 locked up in 2h44m
  • Device 5 locked up in 19h30m

Iā€™m afraid youā€™re wasting some time trying all my suggestions, sorry.
I have no TX2 running a recent release for trying, and with my NX I failed to reproduce with only one cam and teeing into 14 h264encoders+decoders+displaysinks with sync=false.
Iā€™d rather suspect your cameras driver or argus, but this is pure speculation.
You may tell what are your sensors and format in mode 1.

So my only advice would just be to narrow it down to only:
nvarguscamerasrc ! fakesink or nvarguscamerasrc ! nvv4l2h264enc ! fakesink.

Also note that excessive gstreamer log level may result in very different timings than normal case.

Probably @DaneLLL will provide much better help if able to reproduce.

In hindsight that can be a conclusion. If one of those ideas would of yielded a stable and successful long running system, it would of been Eureka!

I hope with the additional tests and lockup information it can show that there is a real fundamental race condition in the NVIDIA provided GStreamer plugins or the hardware pieces that GStreamer relies on. The testing lock up times seem to indicate a race condition or improper usage of a shared resource between the GStreamer pipelines. I am hypothesizing that based on the trend of early failures in the GStreamer pipeline when all six video streams are running. Then as the GStreamer pipelines begin to deadlock and stop processing the data properly the remaining streams tend to get longer and longer run times. So the failures seem to be correlated to more usage of the underlying resources. However, even with a single GStreamer pipeline the lock up issue can still be reproduced. This indicates that the problem is still there and exposed with only one running at a time, just much less frequently.

A couple of questions before we conclude you have failed to reproduce the issue:

  1. How many camera sources did you test with?
  2. How long did you run your test?
  3. Which version of L4T are you using?
  4. Which power profile are you running on and have you enabled the Denver cores?
  5. What is the exact GStreamer command you are executing?
  6. Did you try with a dark room to ensure the image sensor was generating random static data needing to be encoded in every frame?

I can certainly understand that. However, I have seen it lock up even with no debugging output. I didnā€™t even know there was debug logging in GStreamer till I started trying to investigate the lockup. The debug output is a mechanism to easily see when one of the GStreamer pipelines are locked up. It also allows a precise time/duration for how long it runs before locking up. Without the logging it will be difficult to know that they are locked up, especially with the fakesink.

Well, consider Iā€™m just a hobbyist, curious and trying to help.
As said before, I donā€™t have a 6 camera setup. My only TX2 can just run R28.2, so I canā€™t help about TX2 specific issue.
Iā€™ve just tried to reproduce with a Xavier NX and one RPi cam, teeing into 14 pipelines each doing h264 encoding+decoding + display. I have run:

gst-launch-1.0 -e nvarguscamerasrc sensor-mode=2 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! tee name=video ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false   video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false       video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false      video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false     video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false    video. ! nvv4l2h264enc maxperf-enable=1 bitrate=4000000 insert-sps-pps=true ! rtph264pay mtu=1400 ! rtph264depay ! nvv4l2decoder ! nvvidconv ! video/x-raw, width=640, height=480 ! xvimagesink sync=false

for almost one hour and seen nothing wrong. Run in dark room, but below a display so some light though. Tried adding a lot of motion but this didnā€™t change.

Did you try adding sync = false to your sinks ?

I donā€™t have so much time for trying much more, sorry. You should expect more from NVIDIA support.
Again, it would help for advising to know about your sensor and modes.

Hi,
We can observe the issue. Will update once there is new finding.

1 Like