Nvargusdaemon crashing after camera has been active for a short time

Hi,

I’m experiencing an issue where argus crashes after the camera has been running for a short period of time, sometimes quickly (<10s) and other times after a few (<5min) minutes. This occurs reliably on my setup, but the time to crash varies a bit.

First, my configuration:
Xavier NX devkit
Jetpack 5.0.2, L4T 35.1
MIPI ports configured by Jetson IO for IMX219 on one, IMX477 on the other. IMX477 unit is the Pi HQ camera with the resistor rework, and IMX219 unit is the PI Camera V2.1
System was flashed by L4T SDK’s flash.sh utility.

I’ll run a minimal pipeline and get the following output. Sensor 1 is the IMX219.

gst-launch-1.0 nvarguscamerasrc sensor-id=1 ! fakesink
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3280 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 3280 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
   Camera index = 1 
   Camera mode  = 2 
   Output Stream W = 1920 H = 1080 
   seconds to Run    = 0 
   Frame Rate = 29.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: ERROR OCCURRED
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: CANCELLED
Additional debug info:
Argus Error Status
Execution ended after 0:01:32.044832334
Setting pipeline to NULL ...
GST_ARGUS: Cleaning up
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
Freeing pipeline ...
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Argus client is exiting with 4 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 355)

The journal shows the following, with the crash occurring on the line timestamped 20:23:04

-- Logs begin at Wed 2022-09-07 23:58:16 HST, end at Tue 2022-12-13 20:37:25 HST. --
Dec 13 20:18:39 nx systemd[1]: Started Argus daemon.
Dec 13 20:22:00 nx nvargus-daemon[1102]: === NVIDIA Libargus Camera Service (0.98.3)=== Listening for connections...=== gst-launch->
Dec 13 20:22:00 nx nvargus-daemon[1102]: OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module1
Dec 13 20:22:00 nx nvargus-daemon[1102]: NvPclHwGetModuleList: No module data found
Dec 13 20:22:01 nx nvargus-daemon[1102]: OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
Dec 13 20:22:01 nx nvargus-daemon[1102]: ---- imager: No override file found. ----
Dec 13 20:22:01 nx nvargus-daemon[1102]: ---- imager: No override file found. ----
Dec 13 20:22:01 nx nvargus-daemon[1102]: E/ libnvphs:socket: Error[2]: socket connection /var/lib/nvphs/nvphsd.ctl to PHS failed: N>
Dec 13 20:22:01 nx nvargus-daemon[1102]: D/ libnvphs:socket: Warning: connecting to Power Hinting Service failed. Is PHS running?
Dec 13 20:22:01 nx nvargus-daemon[1102]: === gst-launch-1.0[2789]: CameraProvider initialized (0xffff785a0990)SCF: Error BadValue: >
Dec 13 20:22:01 nx nvargus-daemon[1102]: E/ libnvphs:socket: Error[2]: socket connection /var/lib/nvphs/nvphsd.ctl to PHS failed: N>
Dec 13 20:22:01 nx nvargus-daemon[1102]: D/ libnvphs:socket: Warning: connecting to Power Hinting Service failed. Is PHS running?
Dec 13 20:22:01 nx nvargus-daemon[1102]: E/ libnvphs: Error: NvPHSSendThroughputHints[usecase=camera, hint=MinCPU, value=4294967295>
Dec 13 20:22:20 nx nvargus-daemon[1102]: === gst-launch-1.0[2789]: CameraProvider destroyed (0xffff785a0990)=== gst-launch-1.0[2789>
Dec 13 20:22:20 nx nvargus-daemon[1102]: OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module1
Dec 13 20:22:20 nx nvargus-daemon[1102]: NvPclHwGetModuleList: No module data found
Dec 13 20:22:20 nx nvargus-daemon[1102]: OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
Dec 13 20:22:20 nx nvargus-daemon[1102]: ---- imager: No override file found. ----
Dec 13 20:22:20 nx nvargus-daemon[1102]: ---- imager: No override file found. ----
Dec 13 20:23:04 nx nvargus-daemon[1102]: Module_id 30 Severity 2 : (fusa) Error: InvalidState Status syncpoint signaled but status >
Dec 13 20:23:04 nx nvargus-daemon[1102]: Module_id 30 Severity 2 : (fusa) Error: InvalidState  propagating from:/capture/src/fusaIs>
Dec 13 20:23:04 nx nvargus-daemon[1102]: === gst-launch-1.0[2870]: CameraProvider initialized (0xffff78986a00)SCF: Error Timeout: F>
Dec 13 20:23:04 nx nvargus-daemon[1102]: Module_id 30 Severity 2 : (fusa) Error: InvalidState Status syncpoint signaled but status >
Dec 13 20:23:04 nx nvargus-daemon[1102]: Module_id 30 Severity 2 : (fusa) Error: InvalidState  propagating from:/capture/src/fusaVi>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error InvalidState:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, f>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(),>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error InvalidState: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.c>
Dec 13 20:23:04 nx nvargus-daemon[1102]: Module_id 30 Severity 2 : (fusa) Error: ResourceAlreadyInUse All captures are already pend>
Dec 13 20:23:04 nx nvargus-daemon[1102]: Module_id 30 Severity 2 : (fusa) Error: ResourceAlreadyInUse  propagating from:/capture/sr>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/FusaCaptureViCsiH>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDev>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDev>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error Timeout:  (propagating from src/api/Buffer.cpp, function waitForUnlock(), line >
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function r>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse:  (propagating from src/common/Utils.cpp, function workerT>
Dec 13 20:23:04 nx nvargus-daemon[1102]: SCF: Error ResourceAlreadyInUse: Worker thread CaptureScheduler frameStart failed (in src/>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNe>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, fu>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function do>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErr>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureSer>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, funct>
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error InvalidState:  (propagating from src/components/stages/SensorCaptureStage.cpp, >
Dec 13 20:23:06 nx nvargus-daemon[1102]: SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, functi>
Dec 13 20:23:13 nx nvargus-daemon[1102]: SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, funct>
Dec 13 20:23:13 nx nvargus-daemon[1102]: Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/in>

Here are a few troubleshooting steps I took that did produced the same error:

  • ran jetson_clocks
  • ran the same test headless (systemctl set-default multi-user.target)
  • tried a similar pipeline on both the IMX477 and IMX219

Up until it crashes, the camera output works as expected. I’ve used other pipelines that direct it to the display or a socket instead and they are fine.

Sometimes I’d also encounter this message being repeatedly printed instead…

Dec 13 14:20:26 nx nvargus-daemon[1061]: (Argus) Error OverFlow: Too many pending events, ignoring new events (in src/api/EventProviderImpl.cpp, function addEvent(), line 158)

FInally here were a couple of other recent threads showing a similar error.

Are there any other troubleshooting or workaround suggestions for this? Thanks.

Hi! I haven’t recorded very long video with my imx219 but the errors I used to have in the thread were probably due to hw issues (I had i2c timeouts in the syslog). I used a different module (imx219) and I didn’t have i2c timeouts, with the same system…

hello kyle.flores,

we’ve testing on r35.1 and did not see camera stack stability issues.
it looks more like a hardware issue.
is it possible to have cross validation, do you have other Xavier NX, or camera modules for testing this to narrow down the issue?

Gorgo90, thanks for sharing your findings, I’ll check that on my system too just in case!

JerryChang, I believe we have some NX production modules on order so I can try that when they arrive. We also have an alternate carrier board on hand that is very similar to the devkit carrier (it’s the seeed a206) so I should be able to move the devkit module that has this problem onto that carrier to validate. Thanks for your suggestion.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.