Tx2-4g R32.3.1 nvargus-daemon does not restart 100% of the time

So my gstreamer app will fail to set status to NULL and clean up correctly so we are systemctl restart nvargus-daemon before we are killing the gstreamer app.

Testing I find that my gstreamer app does not start 100% of the time, I also find the nvargus-daemon does not restart 100% of the time.

I get a ton a syslog messages still, but I need to find out how to make nvargus-daemon restart 100% of the time.

Testing gives me these message in syslog…

Apr 13 13:24:42 BaseSystem_0_5 systemd[1]: nvargus-daemon.service: Start request repeated too quickly.
Apr 13 13:24:42 BaseSystem_0_5 systemd[1]: nvargus-daemon.service: Failed with result ‘start-limit-hit’.
Apr 13 13:24:42 BaseSystem_0_5 systemd[1]: Failed to start Argus daemon.

How do I fix this and and can I reconfigure this?

Terry, I have been having problems for over a YEAR with this.

1 Like

I concur with Terry. I have seen the same issues with Gstreamer and nvargus-daemon not restarting properly. I have seen the problem on R32.4.3 and on the latest R32.5.0. So it is not isolated to older versions of L4T nvargus-daemon.

Here is what I have logged/captured on L4T R32.5.0 (JP 4.5)…

With only a single camera streaming, nvargus-daemon will either recover by cleaning up the connection when the stream is interrupted (i.e., CSI error), or it will Segmentation fault with a core dump and systemd will automatically restart it. Usually, when nvargus-daemon cleans up the connection with a single camera streaming, Gstreamer will also properly shutdown automatically and can be restarted. When nvargus-daemon Segmentation faults GStreamer usually requires CTRL-C to kill it manually.

However, with multiple cameras streaming (i.e., >1), nvargus-daemon will lock-up and not cleanup properly. It won’t Segmentation fault either, so not even systemd will restart it.

Here is the simple Gstreamer command with one camera that nvargus-daemon will recover properly with when a streaming error occurs:

gst-launch-1.0 -v nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! fakesink

Two example logs from nvargus-daemon where things recover properly:
Connection cleanup (internal restart): nvargus-daemon-connection-cleanup-1-camera.txt (9.1 KB)
Segmentation fault (systemd restart): nvargus-daemon-segmentation-fault-1-camera.txt (6.2 KB)
Note: the segmentation fault is by far the most common case and should be looked into, too.

Here is the simple Gstreamer command with two cameras streaming that nvargus-daemon will never recover properly with when a streaming error occurs:

gst-launch-1.0 -v nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! fakesink  nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! fakesink

Example logs from nvargus-daemon where things fail to recover properly:
nvargus-daemon-deadlock-example-1.txt (9.5 KB)
nvargus-daemon-deadlock-example-2.txt (8.9 KB)

Note this deadlock of nvargus-daemon is easy to reproduce with any type of CSI or stream error, while using 2-6 cameras simultaneously from Gstreamer.

Here is link to the previous topic that I started on this issue:

I’m posting to Terry’s new topic here since there seems to be a preference for not posting to old topics (even if they never were really solved). I’m fairly confident that Terry’s issue is the same issue I have been battling with since August 2020 with robustness/reliability of the NVIDIA camera system. In particular the design and error handling within nvargus-daemon and how it fails to handle faults well or even at all, and causes the camera streams to not be able to be restarted until it is manually force restarted.

Hi,
On JP4.5, please extend timeout value to 5(or 10) seconds in nvarguscamerasrc plugin, and give it a try. Please refer to
JetPack4.5 using gst to open raw camera "nvarguscamerasrc0 reported: TIMEOUT " - #7 by DaneLLL

which gzip file contains nvarguscamerasrc? my tx2-4g does not have enough disk space to build the complete public_source!

Will this work with R32.3.1?

Why recompile, why is this not a parameter to control it from the command line?

Thanks,
Terry

Found gst-nvarguscamera and it needs jetson_multimedia_api, and I know that will not fit on my tx2-4g, I only have 4G remaining,

Really need help,

g++ -c gstnvarguscamerasrc.cpp -fPIC pkg-config --cflags gstreamer-1.0 gstreamer-base-1.0 gstreamer-video-1.0 gstreamer-allocators-1.0 glib-2.0 -I./ -I…/ -I/usr/src/jetson_multimedia_api/include/ -I/usr/src/jetson_multimedia_api/argus/samples/utils/ -o gstnvarguscamerasrc.o
gstnvarguscamerasrc.cpp:47:10: fatal error: Argus/Argus.h: No such file or directory
#include <Argus/Argus.h>

Looks like I can’t make the new nvarguscamerasrc

Terry

Found jetson_multimedia_api but make fails. Need help
Terry

make
g++ -c gstnvarguscamerasrc.cpp -fPIC pkg-config --cflags gstreamer-1.0 gstreamer-base-1.0 gstreamer-video-1.0 gstreamer-allocators-1.0 glib-2.0 -I./ -I…/ -I/usr/src/jetson_multimedia_api/include/ -I/usr/src/jetson_multimedia_api/argus/samples/utils/ -o gstnvarguscamerasrc.o
gstnvarguscamerasrc.cpp: In member function ‘virtual bool ArgusCamera::StreamConsumer::threadExecute(GstNvArgusCameraSrc*)’:
gstnvarguscamerasrc.cpp:302:5: error: reference to ‘Status’ is ambiguous
Status frame_status;
^~~~~~
In file included from /usr/src/jetson_multimedia_api/include/Argus/Argus.h:116:0,
from gstnvarguscamerasrc.cpp:47:
/usr/src/jetson_multimedia_api/include/Argus/Types.h:52:13: note: candidates are: typedef int Status
typedef int Status;
^~~~~~
/usr/src/jetson_multimedia_api/include/Argus/Types.h:93:6: note: enum Argus::Status
enum Status
^~~~~~
gstnvarguscamerasrc.cpp:333:7: error: reference to ‘Status’ is ambiguous
Status argusStatus = iEventError->getStatus();

That does not fix or address the problem in any way. The problem is in nvargus-daemon and the improper handling of error events/conditions. Please look through the log file to see the error and where nvargus-daemon locks up.

Here is the log from nvargus-daemon with libgstnvarguscamerasrc.so rebuilt with the 5 second timeout:
nvargus-daemon-deadlock-w-extended-timeout-in-nvarguscamerasrc-plugin.txt (9.4 KB)

Timeouts and errors are inevitable in a real production environment. nvargus-daemon needs to be robust enough to handle errors and properly be able to restart itself OR exit and let systemd restart it. Neither is happening right now when more than one camera is being streamed and an error occurs.

The goal is not to fix or eliminate the timeout or other error conditions. The goal is to fix nvargus-daemon to handle the error conditions in a more graceful manner.

This problem is also affecting my product using L4T R32.5.0. There needs to be a way for nvargus-daemon to restart properly in order to have a viable consumer product. Similar to JDSchroeder, with multiple cameras streaming, nvargus-daemon will lock-up and not cleanup properly.

Please help to solve this problem as it is affecting our ability to release a product based on TX2.

The timeout solution might help in my case, but in typical Nvidia fashion, the solution is terse, and we the users need cookbook type solutions.

I downloaded the public_sources they pointed at, had to search for the gzip file that contained the nvarguscamerasrc source, and then downloaded it to my Tx2. Read the README and found I needed the jetson_multimedia_api, the make failed looking for <Argus/Argus.h>, and after I found a multimedia_api to install, I am getting compile errors in the gstnvarguscamerasrc.cpp

gstnvarguscamerasrc.cpp: In member function ‘virtual bool ArgusCamera::StreamConsumer::threadExecute(GstNvArgusCameraSrc*)’:
gstnvarguscamerasrc.cpp:302:5: error: reference to ‘Status’ is ambiguous
Status frame_status;
^~~~~~
In file included from /usr/src/jetson_multimedia_api/include/Argus/Argus.h:116:0,
from gstnvarguscamerasrc.cpp:47:
/usr/src/jetson_multimedia_api/include/Argus/Types.h:52:13: note: candidates are: typedef int Status
typedef int Status;
^~~~~~
/usr/src/jetson_multimedia_api/include/Argus/Types.h:93:6: note: enum Argus::Status
enum Status
^~~~~~
gstnvarguscamerasrc.cpp:333:7: error: reference to ‘Status’ is ambiguous
Status argusStatus = iEventError->getStatus();

Looks like the api and this version of the code are not compatible, so I am DEAD IN THE WATER, waiting on Nvidia to provide more info.

Terry - Still waiting for a solution and it has been over a year of asking for help!

Hi,
Please share information about your camera modules. If it is from our camera partners, we can work with partners to do further investigation.

Lumenera,

Lumenera tested my gstreamer app on a nvidia camera and got the same problems, as they did on their driver!

Why no information about my make problems?

Terry

You may try this patch (it’s for R32.5, but should be easy to try on R32.3).

Thank Honey_Patouceul for sharing the patch. It should fix ambiguous Argus::Status.

Thanks @DaneLLL @Honey_Patouceul nvgstarguscamerasrc.cpp now compiles.

How are customers expected to be able to find this patch? Or am I just lucky that Honey has a great memory about something that was seen in another, post.

Customers like my self do not have time to read all topics,

Nvidia needs a better database/way for customers to find these facts.

Thanks,
Terry

now it compiles but I get
/usr/bin/ld: cannot find -lnvdsbufferpool

I have install jetson_multimedia_api, and it does not compile, remember I am on r32.3.1 not r32.5

Terry

I see :

locate nvdsbufferpool
/opt/nvidia/deepstream/deepstream-5.0/sources/includes/gstnvdsbufferpool.h
/usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so
/usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so.1.0.0

Hi,
We collect known issues in
Jetson/L4T/r32.5.x patches - eLinux.org
Please take a look.

For error about nvdsbufferpool, please install DeepStream SDK and try again.

None of those patches address the issue in nvargus-daemon. nvargus-daemon locks up whenever there are more than one camera streaming and one of them has an error (i.e., fence timeout, CSI framing error, etc.). nvargus-daemon needs to automatically restart itself or die and let systemd restart it when a camera has an error. The cleanup/error handling code in nvargus-daemon needs to be looked at and addressed to eliminate the deadlock.

If you can’t eliminate the deadlocks in nvargus-daemon when errors occur, then you need to modify nvargus-daemon to tie it in with the built-in systemd watchdog support. That way if/when nvargus-daemon deadlocks and stops kicking the watchdog, systemd will automatically restart it after the specified timeout. You can read more about systemd including the watchdog support here: systemd.service

Hi @JDSchroeder
Could you share information about the camera, vendor and model ID. From the log, it looks like CSI cannot catch correct FE:

A FS packet was found for a virtual channel that was already in frame. An errored FE packet was injected before FS was allowed through.
captureErrorCallback Stream 0.0 capture 10869 failed: ts 9212873796096 frame 278 error 2 data 0x000000a0

SCF: Error Timeout: ISP port 0 timed out! (in src/services/capture/NvIspHw.cpp, function waitIspFrameEnd(), line 478)

If the vendor is our camera partner, we can work with them to debug why frame end signal cannot be correctly captured.