AGX Orin: nvargus-daemon: Error InvalidState

AGX Orin, JP 5.0.2, 6 cameras.

I’ve seen similar problems reported reported with on other platforms and with other Jetpack version, but have not found any actual resolution or any official documentation of the nvargus-daemon.

I use gstreamer application:

N=0,...,7
gst-launch-1.0 nvarguscamerasrc sensor-id=${N} ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=10/1'  ! nvvidconv flip-method=0 ! 'video/x-raw, format=(string)I420' ! queue ! shmsink wait-for-connection=0 socket-path=/tmp/shmsink${N}

It may take a few minutes or several hours to fail. The time to failure is not deterministic, but It seems like the more streams are running, the faster the failure. In this case I was running 6 streams and the failure occurred after about 4 hours. The nvargus-daemon gets into InvalidState and all streams fail in the same way. Restart of the daemon is required before the applications can be restarted.

Here is the application (6 instances):

root@forge:~# gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=10/1'  ! nvvidconv flip-method=0 ! 'video/x-raw, format=(string)I420' ! queue ! shmsink wait-for-connection=0 socket-path=/tmp/shmsink0
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 1920 x 1200 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 28000, max 22000000;

GST_ARGUS: 960 x 600 FR = 120.000005 fps Duration = 8333333 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 28000, max 22000000;

GST_ARGUS: Running with following settings:
   Camera index = 0
   Camera mode  = 0
   Output Stream W = 1920 H = 1200
   seconds to Run    = 0
   Frame Rate = 29.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: ERROR OCCURRED
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: UNAVAILABLE
Additional debug info:
Argus Error Status
Execution ended after 4:03:28.847821477
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
GST_ARGUS: Cleaning up
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
Setting pipeline to NULL ...
Freeing pipeline ...
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Argus client is exiting with 4 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 355)

And here is the nvargus-daemon

SCF: Error InvalidState:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 593)
SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
SCF: Error InvalidState: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function startCaptureInternal(), line 809)
SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp, function doCSItoMemCapture(), line 530)
SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp, function issueCapture(), line 477)
SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1291)
SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1122)
SCF: Error Timeout:  (propagating from src/api/Buffer.cpp, function waitForUnlock(), line 643)
SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function returnBuffer(), line 373)
SCF: Error ResourceAlreadyInUse:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
SCF: Error ResourceAlreadyInUse: Worker thread CaptureScheduler frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/SensorCaptureStage.cpp, function doHandleRequest(), line 86)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/SensorCaptureStage.cpp, function doHandleRequest(), line 86)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 906)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/SensorCaptureStage.cpp, function doHandleRequest(), line 86)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: 4 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)
SCF: Error InvalidState: 4 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)
SCF: Error InvalidState: 3 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)
SCF: Error InvalidState: 4 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)
SCF: Error InvalidState: 4 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)
SCF: Error InvalidState: 7 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)

hello Agtonomy,

did you meant 6-cam/4-hour is the quickest way to reproduce the issue?

may I know what’s your system configuration, is it running at performance mode, MaxN?
did you have other process executed in the background?

how about simplify the pipeline to narrow down the issue.
could you please try putting all of them for streaming without display preview frames, only shows frame-rate to the terminal.
for example,
$ gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM),width=1920, height=1080, framerate=30/1, format= NV12' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=I420' ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

To answer the questions:

  • running MAXN
  • the system was completely idle besides running the video streams
  • the failure is not deterministic - the same test may fail within minutes or within hours

Next, as suggested, I ran the streams fakesink - the only change I made was width=1920, height=1200 which is the native sensor resolution:

gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM),width=1920, height=1200, framerate=30/1, format= NV12' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=I420' ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
  • The streams ran successfully, until I logged in and started doing on the something on the system, I got some failure. I couldn’t tell what trigerred the failure
  • Restarted the tests
  • Ran for over 48 hrs without touching the system, then I decided to terminate stream 7:
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 9341180, dropped: 0, current: 48.37, average: 47.76
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 9341205, dropped: 0, current: 48.05, average: 47.76
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 9341229, dropped: 0, current: 47.31, average: 47.76
^Chandling interrupt.
Interrupt: Stopping pipeline ...
Execution ended after 54:19:42.970236917
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
GST_ARGUS: Cleaning up
CONSUMER: Done Success
GST_ARGUS: Done Success
Setting pipeline to NULL ...
Freeing pipeline ...
root@forge:~#

At the moment I stopped stream 7, stream 3 failed, generating the familiar InvalidState:

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 9344304, dropped: 0, current: 47.81, average: 47.77
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 9344327, dropped: 0, current: 36.60, average: 47.77
CONSUMER: ERROR OCCURRED
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: TIMEOUT
Additional debug info:
Argus Error Status
Execution ended after 54:20:11.567348020
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
GST_ARGUS: Cleaning up
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
Setting pipeline to NULL ...
Freeing pipeline ...
(Argus) Error InvalidState: Argus client is exiting with 2 outstanding client th/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 355)
root@forge:~#
Jan 15 20:12:52 forge nvargus-daemon[399785]: === gst-launch-1.0[410143]: CameraProvider destroyed (0xfffe94001120)=== gst-launch-1.0[410143]: Connection closed (FFFE697F6900)=== gst-launch-1.0[410143]: Connection cleaned up (FFFE697F6900)SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Jan 15 20:12:52 forge nvargus-daemon[399785]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Jan 15 20:12:52 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Jan 15 20:12:52 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Jan 15 20:12:52 forge nvargus-daemon[399785]: SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
Jan 15 20:12:55 forge kernel: bwmgr API not supported
Jan 15 20:12:55 forge kernel: ar0234 31-0010: ar0234_write_table: channel 13,
Jan 15 20:12:55 forge kernel: bwmgr API not supported
Jan 15 20:12:55 forge kernel: ar0234 31-0010: ar0234_power_off:
Jan 15 20:12:58 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function assignAllBuffersFromStream(), line 241)
Jan 15 20:12:58 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/components/stages/CCDataSetupStage.cpp, function doHandleRequest(), line 68)
Jan 15 20:12:58 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
Jan 15 20:13:00 forge nvargus-daemon[399785]: waitForIdleLocked remaining request 14881926
Jan 15 20:13:00 forge nvargus-daemon[399785]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 909)
Jan 15 20:13:00 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function abortCaptures(), line 875)
Jan 15 20:13:55 forge nvargus-daemon[399785]: SCF: Error InvalidState: 1 buffers still pending during EGLStreamProducer destruction (propagating from src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 302)
Jan 15 20:13:55 forge nvargus-daemon[399785]: SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function ~EGLStreamProducer(), line 50)
Jan 15 20:14:00 forge nvargus-daemon[399785]: waitForIdleLocked remaining request 14881926
Jan 15 20:14:00 forge nvargus-daemon[399785]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 909)
Jan 15 20:14:00 forge nvargus-daemon[399785]: (Argus) Error Timeout:  (propagating from src/api/CaptureSessionImpl.cpp, function destroy(), line 169)
Jan 15 20:14:05 forge nvargus-daemon[399785]: waitForIdleLocked remaining request 14881926
Jan 15 20:14:05 forge nvargus-daemon[399785]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 909)
Jan 15 20:14:05 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function abortCaptures(), line 875)
Jan 15 20:14:05 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function shutdown(), line 405)
Jan 15 20:14:05 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function shutdown(), line 505)
Jan 15 20:14:05 forge nvargus-daemon[399785]: SCF: Error Timeout:  (propagating from src/api/CameraDriver.cpp, function deleteSession(), line 641)
Jan 15 20:14:05 forge nvargus-daemon[399785]: (Argus) Error Timeout:  (propagating from src/api/CaptureSessionImpl.cpp, function destroy(), line 194)

After the failure, stream 3 cannot be restarted (in fact none of the streams can be restarted:

root@forge:~# gst-launch-1.0 nvarguscamerasrc sensor-id=3 ! 'video/x-raw(memory:NVMM),width=1920, height=1200, framerate=30/1, format=NV12' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=I420' ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function openSocketConnection(), line 219)
(Argus) Error Timeout: Cannot create camera provider (in src/rpc/socket/client/SocketClientDispatch.cpp, function createCameraProvider(), line 106)
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = false
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, execute:746 Failed to create CameraProvider
/GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/Gstnvvconv:nvvconv0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=(fraction)30/1, format=(string)I420
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=(fraction)30/1, format=(string)I420
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0.GstGhostPad:sink.GstProxyPad:proxypad0: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=(fraction)30/1, format=(string)I420
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=(fraction)30/1, format=(string)I420
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0.GstGhostPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=(fraction)30/1, format=(string)I420
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=(fraction)30/1, format=(string)I420
/GstPipeline:pipeline0/Gstnvvconv:nvvconv0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = false
Got EOS from element "pipeline0".
Execution ended after 0:00:00.007572596
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...
root@forge:~#

hello Agtonomy,

am I understand correct that everything works normally without terminate the stream?
may I know what’s the camera connections, and, how many cameras you’re used.

BTW,
when you saying stream-7, did you meant /dev/video7?
please also share the device tree settings of stream-7 and stream-3, especially the position property settings in tegra-camera-platform {} fields.
thanks

I am using the Connect Tech Forge Carrier and 4 Leopard Imaging Hawk cameras:

https://www.leopardimaging.com/li-ar0234cs-stereo-gmsl2-hawk/

root@forge:~# for f in /sys/firmware/devicetree/base/tegra-camera-platform/modules/module*/position; do cat $f; echo; done
bottomleft
bottomright
topleft
topright
left
right
backleft
backright
root@forge:~#

Yes, stream-7 I mean /dev/video7. I don’t think 7 has anything to do with the failure, it was rundomly chosen to be terminated first.

I also ran into this problem earlier on AGX Orin dev.kit with fewer cameras, and on Xavier dev.kit with JP4.6, The symptoms were the same, but these were isolated cases.

It does seem that likelihood increases with the number of sensors. I can run a any single stream or two streams without fail. I haven’t done much testting for 3-5 streams, but at 6 or more the failures are frequent.

The biggest problems I see with nvargus-daemon is that it locks itself in invalidstate. If it simply failed, systemd would restart it.

@Agtonomy

Could you try below command to see if the failure issue is still there. We added “tnr-mode=0” to turn off de-noise.

N=0,…,7
gst-launch-1.0 nvarguscamerasrc sensor-id=${N} tnr-mode=0 ! ‘video/x-raw(memory:NVMM), width=(int)1920, height=(int)1200, framerate=10/1’ ! nvvidconv flip-method=0 ! ‘video/x-raw, format=(string)I420’ ! queue ! shmsink wait-for-connection=0 socket-path=/tmp/shmsink${N}

Please also make sure there is no overheating issue on Orin SOM.

We haven’t tested the driver for CTI carrier, but per my record, our team provided a Hawk camera driver to Agtonomy for AGX Orin Developer kit before. That driver supports 1 camera using both sensors and 2 cameras using 1 sensor each. Did you experience the same issue on that driver?

We recently deployed the 2+1+1 configuration on the Orin Dev.Kit. We had one failure that was likely due to overheating and cpu throttling. The testing in the field is in progress.

The failures with the CTI carrier are definitely not due to overheating - the board runs in the lab at 25-30 C. Here the failures with 6-8 cameras are fairly frequent.
I will test the suggested pipelines.

@SimonZhu I ran 8 pipelines with tnr-mode=0,
The first attempt for about 50 min and all failed at the same time in a familiar way.
The 2nd attempt run for 7.5 hrs and failed the same way.

@Agtonomy
If the time to failure is the same as before (not even shorter), the issue should not be caused by the denoise.
I saw our SW team are working with you through Email since you can reproduce the issue with Nvidia Devkit (with the driver we provided). Our SW team will test it here and try to fix the issue.

It appears that the failure is preceded by kernel ERROR:

Feb 06 12:23:01 g050-0503 kernel: [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=4226656600128 sof_ts=4226677106848 gerror_code=2 gerror_data=400060 notify_bits=20000"
...
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: SCF: Error InvalidState: Timeout waiting on frame start sensor guid 0, capture sequence ID = 122758 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameStart(), line 507)
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: SCF: Error InvalidState: Worker thread ViCsiHw frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: SCF: Error Timeout:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 593)
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: SCF: Error Timeout: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 06 12:23:07 g050-0503 nvargus-daemon[1782]: Module_id 30 Severity 2 : (fusa) Error: Timeout  propagating from:/capture/src/fusaViHandler.cpp 776

hello Agtonomy,

FYI. we’ve try setting up an environment to test multi-cam long run test-case.

it’s 6-cam camera board and we had running camera preview long run test-case to have confirmation. we’ve enable gst pipeline to enable 6-cam preview pipeline. it’s running 17 hours without issues.

note,
it’s checked with AGX Orin/ JP-5.1/ E3333 for multi-cam long run use-case, it’s confirmed we cannot reproduce the failure.
thanks

The problem with multiple cameras and InvalidState error from nvargus-daemon has been on the forum for years. It’s has been seen with on different platforms, different Jetpack releases, different cameras. None of the cases was given a real solution (I do not consider “it works for me” a solution).

Regardless of the root cause, I see a fundamental problem with the nvargus-daemon software. If a problem occurs, it should restart is processes, or at least exit, so that systemd could take care of the restart. Instead, it goes into a zombie state, although from the system point of view is up and running.

Are the sources of nvargus-daemon and libargus available?

no, they’re not public available.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.