Capture instability

Hi,

I’m experiencing stability issues with a 3-camera setup with a application using LibArgus on a Jetson AGX Orin. The application always start fine and we get a steady stream of captures from all three cameras. However, after a period of time (Could be anything between a few minutes to several hours) the framerate drops and eventually leads to no images being captured.

The nvargus-deamon logs the following:

Dec 10 14:05:28 jetson nvargus-daemon[4351]: Module_id 30 Severity 2 : (fusa) Error: InvalidState  propagating from:/capture/src/fusaViHandler.cpp 759
Dec 10 14:05:28 jetson nvargus-daemon[4351]: CAM: serial no file already exists, skips storing againCAM: serial no file already exists, skips storing againCAM: serial no file already exists, skips storing againSCF: Error InvalidState:  Corr Error Received for sensor 2 .. Continuing!
Dec 10 14:05:28 jetson nvargus-daemon[4351]:  (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 643)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: Module_id 30 Severity 2 : (fusa) Error: ResourceAlreadyInUse All captures are already pending, no idle captures available in:/capture/src/fusaViHandler.cpp 633
Dec 10 14:05:28 jetson nvargus-daemon[4351]: Module_id 30 Severity 2 : (fusa) Error: ResourceAlreadyInUse  propagating from:/capture/src/fusaViHandler.cpp 475
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function startCaptureInternal(), line 866)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp, function doCSItoMemCapture(), line 536)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp, function issueCapture(), line 483)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1530)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueBubbleFillCapturesIfNeeded(), line 721)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1371)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error ResourceAlreadyInUse: Worker thread CaptureScheduler frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 1004)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 411)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 144)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 1004)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 411)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState: Sending critical error event for Session 4
Dec 10 14:05:28 jetson nvargus-daemon[4351]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 144)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
Dec 10 14:05:28 jetson nvargus-daemon[4351]: SCF: Error InvalidState: Sending critical error event for Session 0
Dec 10 14:05:28 jetson nvargus-daemon[4351]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Dec 10 14:05:33 jetson nvargus-daemon[4351]: (Argus) Error BadParameter: Vector index out of bounds (in /dvs/git/dirty/git-master_linux/camera/utils/nvcamerautils/inc/Vector.h, function operator[](), line 365)
Dec 10 14:05:33 jetson nvargus-daemon[4351]: (Argus) Error BadParameter: Vector index out of bounds (in /dvs/git/dirty/git-master_linux/camera/utils/nvcamerautils/inc/Vector.h, function operator[](), line 365)
Dec 10 14:05:33 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43949
Dec 10 14:05:33 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43950
Dec 10 14:05:33 jetson nvargus-daemon[4351]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)
Dec 10 14:05:33 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43949
Dec 10 14:05:33 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43950
Dec 10 14:05:33 jetson nvargus-daemon[4351]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)
Dec 10 14:05:37 jetson nvargus-daemon[4351]: === kcc[4357]: Connection closed (FFFF9521A900)=== kcc[4357]: WARNING: CameraProvider was not destroyed before client connection terminated.=== kcc[4357]:          The client may have abnormally terminated. Destroying CameraProvider...=== kcc[4357]: CameraProvider destroyed (0xffff90873aa0)=== kcc[4357]: WARNING: Cleaning up 3 outstanding requests...=== kcc[4357]: WARNING: Cleaning up 3 outstanding stream settings...=== kcc[4357]: WARNING: Cleaning up 3 outstanding queues...=== kcc[4357]: WARNING: Cleaning up 3 outstanding sessions...SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)
Dec 10 14:05:37 jetson nvargus-daemon[4351]: Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/incoming cc for sensor guid 2 count -1739005184
Dec 10 14:05:38 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43949
Dec 10 14:05:38 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43950
Dec 10 14:05:38 jetson nvargus-daemon[4351]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)
Dec 10 14:05:38 jetson nvargus-daemon[4351]: SCF: Error InvalidState: 1 buffers still pending during EGLStreamProducer destruction (in src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 300)
Dec 10 14:05:43 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43949
Dec 10 14:05:43 jetson nvargus-daemon[4351]: waitForIdleLocked remaining request 43950
Dec 10 14:05:43 jetson nvargus-daemon[4351]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)

The only indication that something is wrong from the LibArgus API is that the FrameConsumer->acquireFrame() returns STATUS_TIMEOUT

The application works again after restarting the nvargus-daemon, but as before only for a while.

NOTE:
The application did work before, but after a changing the camera expansion board connecting the camera to the carrier board we started experiencing this issue. I’ve been talking both to the camera and carrier board manufacturers without any headway.

I’ve tried to boost the clocks:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

Other info:

  • We’re using Jetpack 5.1.2.
  • We’re using a third-party connector board
  • 2 Cameras are set as slave and 1 as master.

Any idea as to how I can resolve this issue ?
Thanks.

Hi,

For the camera basic functionality first needs to check the device and driver configuration.
You can reference to below program guide for the detailed information of device tree and driver implementation.
https://docs.nvidia.com/jetson/archives/r36.3/DeveloperGuide/SD/CameraDevelopment/SensorSoftwareDriverProgramming.html?highlight=programing#sensor-software-driver-programming

Please refer to Applications Using V4L2 IOCTL Directly by using V4L2 IOCTL to verify basic camera functionality.
https://docs.nvidia.com/jetson/archives/r36.3/DeveloperGuide/SD/CameraDevelopment/SensorSoftwareDriverProgramming.html?highlight=programing#to-run-a-v4l2-ctl-test

Once confirm the configure and still failed below link help to get log and some information and some tips for debug.
https://elinux.org/Jetson/l4t/Camera_BringUp#Steps_to_enable_more_debug_messages

Thanks!

Hi @carolyuu

I tried the v4l2-ctl command in this link:

E.g. I ran:
v4l2-ctl --set-fmt-video=width=3856,height=2180,pixelformat=RG12 --stream-mmap --set-ctrl=sensor_mode=0 --stream-count=100 -d /dev/video2 --verbose

I also added the --verbose option, and I got:

VIDIOC_QUERYCAP: ok
VIDIOC_S_EXT_CTRLS: ok
VIDIOC_G_FMT: ok
VIDIOC_S_FMT: ok
Format Video Capture:
        Width/Height      : 3856/2180
        Pixel Format      : 'RG12' (12-bit Bayer RGRG/GBGB)
        Field             : None
        Bytes per Line    : 7936
        Size Image        : 17300480
        Colorspace        : sRGB
        Transfer Function : Default (maps to sRGB)
        YCbCr/HSV Encoding: Default (maps to ITU-R 601)
        Quantization      : Default (maps to Full Range)
        Flags             :
                VIDIOC_REQBUFS returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_STREAMON returned 0 (Success)

After that it just hangs indefinitely. I’m expecting cap dqbuf log messages to appear, which it does for another camera setup I run.

It seems odd though, that we’re able to capture images with the Argus API, but not via v4l2-ctl

Apply below solution.

I’ll give it a go and rapport back.

@ShaneCCC

I’ve reflashed the unit with the files you supplied.
Sadly, I’m still experience instability with the application using LibArgus, and the v4l2-ctlcommands still just hangs.

Any other suggestions ?

What do you mean v4l2-ctl commands still just hangs?
v4l2-ctl capture instable too? If yes that could be the CID function like xxx_set_expoure/xxx_set_frame_rate have problem.

@ShaneCCC

Like I mentioned in a previous comment Capture instability - #4 by christian.johansen

When I try yo run:
v4l2-ctl --set-fmt-video=width=3856,height=2180,pixelformat=RG12 --stream-mmap --set-ctrl=sensor_mode=0 --stream-count=100 -d /dev/video2 --verbose

I get:

VIDIOC_QUERYCAP: ok
VIDIOC_S_EXT_CTRLS: ok
VIDIOC_G_FMT: ok
VIDIOC_S_FMT: ok
Format Video Capture:
        Width/Height      : 3856/2180
        Pixel Format      : 'RG12' (12-bit Bayer RGRG/GBGB)
        Field             : None
        Bytes per Line    : 7936
        Size Image        : 17300480
        Colorspace        : sRGB
        Transfer Function : Default (maps to sRGB)
        YCbCr/HSV Encoding: Default (maps to ITU-R 601)
        Quantization      : Default (maps to Full Range)
        Flags             :
                VIDIOC_REQBUFS returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QUERYBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_QBUF returned 0 (Success)
                VIDIOC_STREAMON returned 0 (Success)

It just “hangs” there indefinitely and no images are captured. This is still the case after I reflashed the unit with files you supplied.

Indeed, I see a WARNING in the kernel log related to setting the operation_mode of the camera sensor. I.e. setting the camera in Master/Slave mode.

Could this be the cause of the v4l2command stalling ?

@ShaneCCC

I’ve found that the v4l2-ctl command mentioned previously works if I do not run my application in beforehand after a boot.

So something happens when I run my application (which uses LibArgus) that causes problems for V4L2 afterwards.

Still does not explain the instability though, just thought I should mention my find.

I would suspect the sensor driver frame sync design cause the problem.
Could both of the sensors run without frame sync mode?

We experience the same issue when running all three sensors in master mode (E.g. no synchronization).

Running only one camera seems to work though. Although I haven’t been able to run that test for too long yet so the error might still occur.

@ShaneCCC

I did an overnight test with V4L2 yesterday/today. I.e. I ran:

Master
v4l2-ctl -d2 -c operation_mode=0 -c synchronizing_function=1 -c frame_rate=25000000 --stream-mmap --stream-skip=10 -c bypass_mode=0 --verbose

Slave
v4l2-ctl -d0 -c operation_mode=1 -c synchronizing_function=2 -c frame_rate=25000000 --stream-mmap --stream-skip=10 -c bypass_mode=0 –verbose

v4l2-ctl -d1 -c operation_mode=1 -c synchronizing_function=2 -c frame_rate=25000000 --stream-mmap --stream-skip=10 -c bypass_mode=0 --verbose

And that ran successfully for 18 hours before I stopped it this morning. This leads me to believe the issue lies with our application and/or with the Nvidia Argus library.

You can make the CID function in sensor driver like xxx_set_exposure…
to dummy function to make sure the Argus run the same configuration with v4l2-ctl.

Can I achieve the same by using the setAeLock(bool lock) function for IAutoControlSettings?

It’s would be safe to modify the sensor driver.

1 Like

I will give it a go and report back.

I changed the set_exposure() function in the sensor driver to just return 0;

However, the issue persists.

I also so see that nvargus_daemon logs a lot of these messages:
Jan 03 11:55:27 jetson nvargus-daemon[1229]: (Argus) Error OverFlow: Too many pending events, ignoring new events (in src/api/EventProviderImpl.cpp, function addEvent(), line 158)

Although, they seem to occur from the start.

Replace below libs to verify again.

libnvargus.so.35.4.1 (1.2 MB)

1 Like

Should I try with only libnvargus.so.35.4.1, or also with the libraries you shared in this post: