Argus library segmentation fault when resetting deserializer

The cameras and deserializers we use have previously been described here: TX2 VI mode and ISP mode bugs when using CSI2 virtual channels

I’m using Jetson TX2 SoM with L4T 32.4.3, JetPack 4.4 and two OV9782 cameras connected via
2x DS90UB953-Q1 and DS90UB954-Q1 FPD-Link III deserializer to the CSI-A/CSI-B ports, using
4 data lanes and one clock lane. This requires using CSI2 virtual channels, so I’m using a configuration
with virtual channels VC0 and VC1.

The OV9782 camera driver was developed by Nvidia partner D3 Engineering

L4T version: 32.4.3
Kernel version: 4.9.140-l4t-r32.4+g1582a8a5405d

We want to be able to gracefully handle ESD pulses, without crashing the capturing process. To simulate ESD pulses, we use the following command, which resets the deserializer:

i2cset -f -y 0x06 0x32 0x01 0x1

Resetting the deserializer results in the following libargus output:

NvViErrorDecode Stream 2.1 failed: ts 9269118829728 frame 117 error 2 data 0x000000a2
NvViErrorDecode CaptureError: CsimuxFrameError (2)
NvViErrorDecode See https://wiki.nvidia.com/wmpwiki/index.php/Camera_Debugging/CaptureError_debugging for more information and links to documents.
CsimuxFrameError_Regular : 0x000000a2
    Stream ID                [ 2: 0]: 2
        
    VPR state from fuse block    [ 3]: 0
        
    Frame end (FE)              [ 5]: 1
        A frame end has been found on a regular mode stream.
    FS_FAULT                    [ 7]: 1
        A FS packet was found for a virtual channel that was already in frame. An errored FE packet was injected before FS was allowed through.
Seconds until reset: 24 (Buffers received: 112)
(Argus) Error Timeout:  (propagating from src/api/BufferOutputStreamImpl.cpp, function acquireBuffer(), line 265)
ConsumerThread: TIMEOUT
SCF: Error BadValue: timestamp cannot be 0 (in src/services/capture/NvViCsiHw.cpp, function waitCsiFrameStart(), line 630)
SCF: Error BadValue:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error BadValue: Worker thread ViCsiHw frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
captureErrorCallback Stream 2.1 capture 4202 failed: ts 9269118829728 frame 117 error 2 data 0x000000a2

Seconds until reset: 23 (Buffers received: 112)
Seconds until reset: 22 (Buffers received: 112)
SCF: Error Timeout: ISP port 0 timed out! (in src/services/capture/NvIspHw.cpp, function waitIspFrameEnd(), line 478)
Seconds until reset: 21 (Buffers received: 112)
Seconds until reset: 20 (Buffers received: 112)
Seconds until reset: 19 (Buffers received: 112)
Seconds until reset: 18 (Buffers received: 112)
Seconds until reset: 17 (Buffers received: 112)
Seconds until reset: 16 (Buffers received: 112)
Seconds until reset: 15 (Buffers received: 112)
Seconds until reset: 14 (Buffers received: 112)
SCF: Error Timeout: ISP Stats timed out! (in src/services/capture/NvIspHw.cpp, function waitIspStatsFinished(), line 561)
SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 991)
SCF: Error InvalidState:  (propagating from src/services/capture/NvViCsiHw.cpp, function startCapture(), line 508)
SCF: Error InvalidState:  (propagating from src/services/capture/DeviceRecordNv.cpp, function doCSItoISPCapture(), line 110)
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)

And - after a few seconds - a segmentation fault, which is preceded by the last two lines above, rapidly repeating:

SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
(Argus) Error InvalidState: (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)

When attempting to properly close the capture session after the ESD pulse, this blocks and gives the following output before SEGFAULTing (the function calls are inserted for reference and prefixed with ‘//’):

// iCaptureSession->stopRepeat();
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
// iCaptureSession->waitForIdle();
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
(Argus) Error Timeout:  (propagating from src/api/CaptureSessionImpl.cpp, function waitForIdle(), line 567)
// iBufferOutputStream->endOfStream();
// consumerThread.join();
// outputStream.reset();
// managedBuffers.clear();
// captureSession.reset();
waitForIdleLocked remaining request 632
waitForIdleLocked remaining request 631
waitForIdleLocked remaining request 630
waitForIdleLocked remaining request 629
waitForIdleLocked remaining request 628
SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 921)
(Argus) Error Timeout:  (propagating from src/api/CaptureSessionImpl.cpp, function destroy(), line 166)
SCF: Error Timeout: ISP Stats timed out! (in src/services/capture/NvIspHw.cpp, function waitIspStatsFinished(), line 561)
SCF: Error InvalidState:  (propagating from src/services/capture/NvViCsiHw.cpp, function startCapture(), line 508)
SCF: Error InvalidState:  (propagating from src/services/capture/DeviceRecordNv.cpp, function doCSItoISPCapture(), line 110)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureRecord.cpp, function doCSItoISPCapture(), line 547)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureRecord.cpp, function issueCapture(), line 460)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1293)
SCF: Error InvalidState:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1124)
SCF: Error BadParameter: Fiber not present (in src/components/CaptureContainerImpl.cpp, function detachFiber(), line 597)
SCF: Error Timeout:  (propagating from src/api/Buffer.cpp, function waitForUnlock(), line 637)
SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function returnBuffer(), line 358)
SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error InvalidState: Worker thread CaptureScheduler frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)

Note: In both cases the Argus application terminates with a segmentation fault.

The backtrace of the offending thread is as follows (same for both):

#0  0x0000007fb715d028 in nvcamerautils::Mutex::lock(char const*, unsigned int) const ()
   from /usr/lib/libnvcamerautils.so
#1  0x0000007fb71f6b74 in ?? () from /usr/lib/libnvscf.so
#2  0x0000007fb720bea4 in ?? () from /usr/lib/libnvscf.so
#3  0x0000007fb720c690 in ?? () from /usr/lib/libnvscf.so
#4  0x0000007fb721e580 in ?? () from /usr/lib/libnvscf.so
#5  0x0000007fb721ef38 in ?? () from /usr/lib/libnvscf.so
#6  0x0000007fb725a5d4 in ?? () from /usr/lib/libnvscf.so
#7  0x0000007fb72bc2d8 in ?? () from /usr/lib/libnvscf.so
#8  0x0000007fb728e7c8 in ?? () from /usr/lib/libnvscf.so
#9  0x0000007fb711a628 in ?? () from /usr/lib/libnvos.so
#10 0x0000007fb7e7d394 in ?? () from /lib/libpthread.so.0
#11 0x0000007fb7c2917c in ?? () from /lib/libc.so.6

Why need reset during streaming.
That could be expect error while unexpect streaming failed.

We have a requirement to be able to handle those kinds of resets.

An error is certainly expected (even desired), but it must be possible to handle the error by restarting the capture session. An application crash by a segmentation fault is unacceptable.

Have reference to userAutoExposure in Argus sample to implement the EVENT_TYPE_ERROR

I am able to catch the error through the event queue mechanism as you suggest. The problem is (as pointed out above), that there is no way that I know of to react to the error, and avoid an application crash.

For current the APP just need terminal itself while catch the error.

Hmm, not sure what you mean. Our application is already automatically restarted after it crashes.

We have to avoid crashing our application, because a crash resets any TCP connections that the application maintains, which is a big issue for us.

I would suggest to upgrade to latest release due to have some recovery improve for this version.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.