I hope this message finds you well. We are currently facing a critical issue with some of our production devices utilizing the NVIDIA Xavier NX (EMMC 16GB) in conjunction with the Arducam PTZ Camera (IMX477 sensor).
The problem manifests as an unexpected interruption in the camera stream, resulting in service disruption. Upon reviewing the daemon status, we identified the following logs:
nvidia@nvidia:~$ sudo service nvargus-daemon status
● nvargus-daemon.service - Argus daemon
Loaded: loaded (/etc/systemd/system/nvargus-daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2024-01-19 09:23:40 UTC; 1min 31s ago
Main PID: 353 (nvargus-daemon)
Tasks: 41 (limit: 4915)
CGroup: /system.slice/nvargus-daemon.service
└─353 /usr/sbin/nvargus-daemon
Jan 19 09:23:52 nvidia nvargus-daemon[353]: E/ libnvphs: Error: NvPHSSendThroughputHints[usecase=camera, hint=MinCPU, value=4294967295, timeout_ms=1000]: queue_or_send() failed
Jan 19 09:24:31 nvidia nvargus-daemon[353]: SCF: Error Timeout: (propagating from src/components/CaptureContainerImpl.cpp, function assignAllBuffersFromStream(), line 232)
Jan 19 09:24:31 nvidia nvargus-daemon[353]: SCF: Error Timeout: (propagating from src/components/stages/CCDataSetupStage.cpp, function doHandleRequest(), line 68)
Jan 19 09:24:31 nvidia nvargus-daemon[353]: SCF: Error Timeout: (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
Jan 19 09:24:31 nvidia nvargus-daemon[353]: SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 997)
Jan 19 09:24:39 nvidia nvargus-daemon[353]: waitForIdleLocked remaining request 835
Jan 19 09:24:39 nvidia nvargus-daemon[353]: waitForIdleLocked remaining request 834
Jan 19 09:24:39 nvidia nvargus-daemon[353]: waitForIdleLocked remaining request 833
Jan 19 09:24:39 nvidia nvargus-daemon[353]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 927)
Jan 19 09:24:39 nvidia nvargus-daemon[353]: SCF: Error Timeout: (propagating from src/api/Session.cpp, function abortCaptures(), line 893)
nvidia@nvidia:~$
To mitigate this issue, we have implemented a temporary solution by restarting the nvargus-daemon service (systemctl restart nvargus-daemon) and subsequently restarting our application process. While this workaround helps restore the stream temporarily, it is imperative for us to establish a more permanent resolution to ensure a stable camera stream on our production devices.
Device Details:
Device: Xavier NX (EMMC 16GB)
Jetpack version: 4.6.1
Camera Sensor: Arducam PTZ Camera with IMX477 sensor
were there corrupt frames intermittently? it looks there’s timeout failure reported.
please check with v4l standard controls to check the stream stability,
for instance, $ v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap
yes, it’s intermittent, the camera streaming process is crashing intermittently with logs like the following
(Argus) Error EndOfFile: Receive worker failure, notifying 2 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 340)
(Argus) Error InvalidState: Argus client is exiting with 2 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 357)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 368)
(b'', None)
sleeping 4 sec
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
for more info we are using this pipeline to fetch the frames
could you please also test with Argus sample app. for example, userAutoExposure
this is sample application which include error handling, please try your use-case with this instead.
there’s additional implementation for Argus to recover from error condition.
for example, when there’s capture failure, Argus will report it via EVENT_TYPE_ERROR flag, and the application has to shutdown.
however, argus_camera app it currently does not have a mechanism to automatically stop on detecting errors.
for testing, you can use argus_userautoexposure, or gst-launch with nvarguscamerasrc plugin to verify the automatic closure of the app.
you may install MMAPI package, for instance, $ sudo apt install nvidia-l4t-jetson-multimedia-api
the path looks like following, /usr/src/jetson_multimedia_api/argus/samples/userAutoExposure/