Multi camera stability issue with Argus

I have written a custom application that captures images from 6 cameras using libArgus, sends images to CUDA for post-processing, and then encodes images using Gstreamer. Most of the time, the application runs fine without issues. However, if I run the application for an extended period of time, I occasionally run into a number of different issues, some of which are shown below:

Example error #1

NvViErrorDecode Stream 1.0 failed: ts 1579507581856 frame 181 error 7 data 0x00000002
NvViErrorDecode CaptureError: ChanselShortFrame (7)
NvViErrorDecode See https://wiki.nvidia.com/wmpwiki/index.php/Camera_Debugging/CaptureError_debugging for more information and links to documents.
ChanselShortFrame : 0x00000002
    Channels with PIXEL_INCOMPLETE [11: 0]:
        Channels 1
        This can happen for three reasons: PIXEL_SHORT_FRAME: FE packet arrives before last expected pixel of the uncropped image; EMPTY_FRAME: FE packet arrives before cropped pixels other embedded data been received; PIXEL_OPEN_LINE: A pixel line has been opened with line start but FE packet arrives before line end ever arrives.
SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1130)
SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error Timeout: Worker thread CaptureScheduler frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error BadValue: timestamp cannot be 0 (in src/services/capture/NvViCsiHw.cpp, function waitCsiFrameEnd(), line 711)
SCF: Error BadValue:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
captureErrorCallback Stream 1.0 capture 39369 failed: ts 1579507581856 frame 181 error 7 data 0x00000002

SCF: Error BadValue: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error Timeout:  (propagating from src/api/Buffer.cpp, function waitForUnlock(), line 637)
SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function returnBuffer(), line 358)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 908)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 990)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 908)

Example error #2

NvViErrorDecode Stream 1.0 failed: ts 523183839616 frame 8 error 4 data 0x04b10200
NvViErrorDecode CaptureError: ChanselFault (4)
NvViErrorDecode See https://wiki.nvidia.com/wmpwiki/index.php/Camera_Debugging/CaptureError_debugging for more information and links to documents.
ChanselFault : 0x04b10200
    PIXEL_SHORT_LINE            [ 9]: 1
        A line ends with fewer pixels than expected.
    Current line in frame    [31:16]: 1201

SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1130)
SCF: Error InvalidState: cannot find Fiber (in src/components/FiberScheduler.cpp, function asyncCaptureResult(), line 425)
SCF: Error BadParameter: CC has already been disposed (in src/components/CaptureContainerManager.cpp, function dispose(), line 161)
SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error Timeout: Worker thread CaptureScheduler frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 908)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 395)
SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 137)
SCF: Error BadValue: timestamp cannot be 0 (in src/services/capture/NvViCsiHw.cpp, function waitCsiFrameEnd(), line 711)
SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
SCF: Error BadValue:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error BadValue: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 908)

Example error #3

SCF_AutocontrolACSync failed to wait for an earlier frame to complete.

SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 990)
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)

What exactly do these errors mean, what are some possible reasons behind why they’re occurring, and how can I fix them? Additionally, when these errors occur, they are usually followed by a repeated spamming of two or three error messages at a very high frequency. Is there a way to prevent the program from spamming these error messages? Or to recover from this situation if it ever occurs?

I have already tried running jetson_clocks, as well as boosting the clocks for debugging purposes, as described in https://elinux.org/Jetson_TX2_Camera_BringUp. For reference, I am running on a Jetson TX2 board with Jetpack 4.2 connected to 6 cameras.

Thanks in advance for any insights you might provide.

hello izhou,

may I know how many minutes or hours you’re able to observed the failures.

FYI,
according to JetPack Archive, Jetpack-4.2 were based-on l4t-r32.1, and you should also refer to L4T R32.1 Release Notes, there’re several known issue for multiple camera use-case.
please upgrade to the latest JetPack release, i.e. JetPack-4.3, since we had include some camera software updates for multiple camera functionality.
thanks

Hi JerryChang,

The amount of time it takes before these errors occur is relatively variable - I’ve seen these errors show up after anywhere between 10 minutes, an hour, and occasionally a few hours.

As for updating to Jetpack-4.3, what specific changes were made between Jetpack-4.2 and Jetpack-4.3 related to multi-camera use? I took a look at the release notes under issues, and the only one I noticed was a memory leak issue that is present in both the L4T R32.1 release notes and the L4T 32.3.1 release notes.

I’ll try updating to Jetpack-4.3 to see if it helps, but do you have any idea what these errors mean and why they’re occurring?

hello izhou,

there’re changes for multi-cam check-in for JetPack-4.2.1.
we had increase internal buffers storage to address race condition for multi-cam streaming with high processors loading.
we also update the EGL stream architecture to optimize it with multiple camera use-case.
suggest you should also refer to Topic 1065378 for more details.
you may upgrade to JetPack-4.2.1 or even Jetpack-4.3 which should include those fixes.
thanks