JP5.1 nvarguscamera doesn't recover from single NVCSI failure

pepijn.vanheiningen · February 16, 2023, 10:21am

Hello there!

As part of our migration we’re seeing some changes in the error handling of the camera input between 32.4.4 and 35.1 on Xavier AGX.

I’m running a very simple pipeline:
gst-launch-1.0 nvarguscamerasrc ee-mode=0 tnr-mode=0 aeantibanding=0 silent=false ! fakesink

In L4T 32.4.4, whenever we get a broken frame, we saw this error occur in the nvargus-daemon. However, the pipeline continues to run.

Feb 16 09:42:38 camera nvargus-daemon[6204]: NvCaptureStatusErrorDecode Stream 0.0 failed: sof_ts 68330504004640 eof_ts 68330637136320 frame 0 error 2 data 0x000000a0
Feb 16 09:42:38 camera nvargus-daemon[6204]: NvCaptureStatusErrorDecode Capture-Error: CSIMUX_FRAME (0x00000002)
Feb 16 09:42:38 camera nvargus-daemon[6204]: CsimuxFrameError_Regular : 0x000000a0
Feb 16 09:42:38 camera nvargus-daemon[6204]:     Stream ID                [ 2: 0]: 0
Feb 16 09:42:38 camera nvargus-daemon[6204]:         
Feb 16 09:42:38 camera nvargus-daemon[6204]:     VPR state from fuse block    [ 3]: 0
Feb 16 09:42:38 camera nvargus-daemon[6204]:         
Feb 16 09:42:38 camera nvargus-daemon[6204]:     Frame end (FE)              [ 5]: 1
Feb 16 09:42:38 camera nvargus-daemon[6204]:         A frame end has been found on a regular mode stream.
Feb 16 09:42:38 camera nvargus-daemon[6204]:     FS_FAULT                    [ 7]: 1
Feb 16 09:42:38 camera nvargus-daemon[6204]:         A FS packet was found for a virtual channel that was already in frame.An errored FE packet was injected before FS was allowed through.
Feb 16 09:42:38 camera nvargus-daemon[6204]:     Binary VC number [3:2]   [27:26]: 0
Feb 16 09:42:38 camera nvargus-daemon[6204]:         To get full binary VC number, user need to concatenate VC[3:2] and VC[1:0] together.
Feb 16 09:42:38 camera nvargus-daemon[6204]: SCF: Error InvalidState: Capture error with status 2 (channel 0) (in src/services/capture/NvCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 880)

In L4T 35.1 we see this error instead, and the pipeline fully stops. This is problematic because it breaks our pipeline. Some sensors have this issue too often for it to start and stop completely every time.

Feb 16 09:20:50 camera nvargus-daemon[2164]: SCF: Error InvalidState: Timeout waiting on frame start sensor guid 0, capture sequence ID = 612 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameStart(), line 507)
Feb 16 09:20:50 camera nvargus-daemon[2164]: SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 16 09:20:50 camera nvargus-daemon[2164]: SCF: Error InvalidState: Worker thread ViCsiHw frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 16 09:20:50 camera nvargus-daemon[2164]: SCF: Error Timeout:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 593)
Feb 16 09:20:50 camera nvargus-daemon[2164]: SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 16 09:20:50 camera nvargus-daemon[2164]: SCF: Error Timeout: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 16 09:20:50 camera nvargus-daemon[2164]: Module_id 30 Severity 2 : (fusa) Error: Timeout  propagating from:/capture/src/fusaViHandler.cpp 776

Is there a way to enable the error recovery on 35.1 so the pipeline will continue to run, even when these types of errors occur?

robertogs2 · February 16, 2023, 4:20pm

Hi @pepijn.vanheiningen

I haven’t seen this error before, could you try use the infinite timeout on nvargus before capture to see if it changes this behavior without any drawback?

sudo service nvargus-daemon stop
sudo enableCamInfiniteTimeout=1 nvargus-daemon

Regards,
Roberto Gutierrez,
Embedded SW Engineer at RidgeRun
Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com/

pepijn.vanheiningen · February 16, 2023, 4:21pm

Hi Roberto, Thanks for your help! I tried the infinite timeout but it doesn’t help unfortunately, still getting the same error.

JerryChang · February 17, 2023, 2:59am

hello pepijn.vanheiningen,

may I know what’s the exactly failure whenever you get a broken frame?
is it due to unstable MIPI signal or something else?

pepijn.vanheiningen · February 17, 2023, 8:17am

Hi JerryChang, thank you for your response. We’re still investigating the root cause of the issue. Unfortunately we already have many devices in the field already that have this problem, so we need to be able to handle the error without the entire pipeline shutting down.

JerryChang · February 17, 2023, 8:30am

hello pepijn.vanheiningen,

could you please also test with Argus sample app. for example, userAutoExposure
this is sample application which include error handling, please try your use-case with this instead.
thanks

pepijn.vanheiningen · February 17, 2023, 10:31am

Hey Jerry!

I’m getting the same error with the userAutoExposure sample. It freezes when the error happens, getting this in the nvargus-daemon:

Feb 17 10:29:54 camera nvargus-daemon[23913]: SCF: Error InvalidState: Timeout waiting on frame start sensor guid 0, capture sequence ID = 2150 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameStart(), line 507)
Feb 17 10:29:54 camera nvargus-daemon[23913]: SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 17 10:29:54 camera nvargus-daemon[23913]: SCF: Error InvalidState: Worker thread ViCsiHw frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 17 10:29:54 camera nvargus-daemon[23913]: SCF: Error Timeout:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 593)
Feb 17 10:29:54 camera nvargus-daemon[23913]: SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 17 10:29:54 camera nvargus-daemon[23913]: SCF: Error Timeout: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 17 10:29:54 camera nvargus-daemon[23913]: Module_id 30 Severity 2 : (fusa) Error: Timeout  propagating from:/capture/src/fusaViHandler.cpp 776

JerryChang · February 20, 2023, 1:52am

hello pepijn.vanheiningen,

it shows different error on r35.1, it’s capture engine wait for frames till timeout.
timeout is more critical and it sometimes being sensor configuration issues.

may I know what’s the exactly failure, is it due to unstable MIPI signal?

pepijn.vanheiningen · February 20, 2023, 8:10am

I can simulate it with a small script we created, essentially the MIPI signal stops for a short period of time (a few frames) and then starts again. Running the same script on 32.4.4 and 35.1 shows the difference in the error.

So it isn’t really a difference between the stability of the MIPI signals, but really something inside the error handling. I hope there is a way to get the original error handling where it doesn’t throw this ‘timeout waiting on frame start’ error, but only messages about some issues with the MIPI signal.

JerryChang · February 20, 2023, 8:38am

hello pepijn.vanheiningen,

may I know how you interrupt the stream for testing?
actually, you could toggle this debug node /sys/kernel/debug/camera-video0/streaming to alter the camera stream.
you may enable the camera application,
please use below to terminate the video stream,
# echo 0 > /sys/kernel/debug/camera-video0/streaming

pepijn.vanheiningen · February 20, 2023, 8:46am

We have some specific hardware that processes the MIPI signal, we reset that chip.

I will try to toggle the debug node to see if I can get the same results with that!

pepijn.vanheiningen · February 20, 2023, 8:55am

The directory /sys/kernel/debug/camera-video0 does not exist.

/sys/kernel/debug/camera-video0/streaming: No such file or directory

JerryChang · February 20, 2023, 9:03am

hello pepijn.vanheiningen,

please examine release tag, $ cat /etc/nv_tegra_release
I’ve confirm this debug node is created on JP-5.1
for example,

/sys/kernel/debug# ll camera-video*
camera-video0:
total 0
drwxr-xr-x  2 root root 0 Feb 15 07:45 ./
drwx------ 96 root root 0 Feb 15 07:46 ../
-rw-r--r--  1 root root 0 Feb 15 07:45 streaming

camera-video1:
total 0
drwxr-xr-x  2 root root 0 Feb 15 07:45 ./
drwx------ 96 root root 0 Feb 15 07:46 ../
-rw-r--r--  1 root root 0 Feb 15 07:45 streaming

or…
may I know what’s the camera type you’re using? is it a bayer sensor using CSI interface camera?

pepijn.vanheiningen · February 20, 2023, 9:12am

We are actually running 35.1/JP5.1:

head -n 1 /etc/nv_tegra_release
# R35 (release), REVISION: 1.0, GCID: 31250864, BOARD: t186ref, EABI: aarch64, DATE: Thu Aug 11 03:37:46 UTC 2022

Yes, we’re using a bayer sensor and use the CSI interface, but still not getting the camera-video folders:

/sys/kernel/debug# ll camera-video*
ls: cannot access 'camera-video*': No such file or directory

pepijn.vanheiningen · February 20, 2023, 9:22am

Looks like that debugfs node is not implemented in the driver for our sensor.

pepijn.vanheiningen · February 20, 2023, 1:39pm

All right, so I implemented the camera-video* endpoint in our sensor driver. Starting and stopping briefly the sensor as follows, while in another terminal I run the gstreamer pipeline.

#!/bin/bash
echo 0 > /sys/kernel/debug/camera-video0/streaming
sleep 0.1
echo 1 > /sys/kernel/debug/camera-video0/streaming

This gives me the same Timeout waiting on frame start sensor guid 0 error as before. Do you see the same results on your camera?

P.S. I’m also getting an error in dmesg: [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=1760342946848 sof_ts=1760343144128 gerror_code=2 gerror_data=400 notify_bits=0"

JerryChang · February 21, 2023, 5:43am

hello pepijn.vanheiningen,

FYI, I can reproduce the same issue on reference camera board.
for example,

Feb 21 13:38:34 nvidia-desktop nvargus-daemon[1789]: === gst-launch-1.0[18536]: CameraProvider initialized (0xffffa8684d70)SCF: Error InvalidState: Timeout waiting on frame start sensor guid 0, capture sequence ID = 942 (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameStart(), line 524)
Feb 21 13:38:34 nvidia-desktop nvargus-daemon[1789]: SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 21 13:38:34 nvidia-desktop nvargus-daemon[1789]: SCF: Error InvalidState: Worker thread ViCsiHw frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 21 13:38:35 nvidia-desktop nvargus-daemon: Module_id 30 Severity 2 : (fusa) Error: Timeout  propagating from:/capture/src/fusaViHandler.cpp 776
Feb 21 13:38:35 nvidia-desktop nvargus-daemon[1789]: SCF: Error Timeout:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 610)
Feb 21 13:38:35 nvidia-desktop nvargus-daemon[1789]: SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Feb 21 13:38:35 nvidia-desktop nvargus-daemon[1789]: SCF: Error Timeout: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
Feb 21 13:38:37 nvidia-desktop nvargus-daemon[1789]: SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceDeviceViCsi.cpp, function waitCompletion(), line 368)
Feb 21 13:38:37 nvidia-desktop nvargus-daemon[1789]: SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function pause(), line 936)
Feb 21 13:38:37 nvidia-desktop nvargus-daemon[1789]: SCF: Error Timeout: During capture abort, syncpoint wait timeout waiting for current frame to finish (in src/services/capture/CaptureServiceDevice.cpp, function handleCancelSourceRequests(), line 1029)

this is a regression, let me arrange resources for checking this.
in the meanwhile.
please have below as temporary solution to kill and restart nvargus-daemon service to restore the camera functionality.
$ sudo pkill nvargus-daemon
$ sudo systemctl start nvargus-daemon

pepijn.vanheiningen · February 21, 2023, 8:43am

Thank you for reproducing it! Unfortunately the temporary solution doesn’t work for us, since this can happen many times per hour. Restarting the nvargus-daemon and our pipelines will take a while, where we lose video.

Do you have any idea when a fix might be available? This is currently blocking the production of our new cameras.

JerryChang · February 22, 2023, 2:20am

hello pepijn.vanheiningen,

we haven’t root cause the issue, and it may take some time to figure-out the solution.
let me arrange resources for investigation, you should also expect this won’t be fix soon.

pepijn.vanheiningen · February 22, 2023, 1:04pm

Thanks for your continued effort on this.

One additional question: do you have some insight into how easy it will be to update our Xaviers over-the-air later? Or do you think this fix will need to be applied when flashing the device?

Please keep me up-to-date if you learn more about the root cause of the problem!

Topic		Replies	Views
How to make Argus in Jetson 35.2.1 recover after a corrupted frame? Jetson AGX Orin camera , nvbugs	30	2463	July 7, 2023
Tx2-4g R32.3.1 nvargus-daemon does not restart 100% of the time Jetson TX2 camera , gstreamer	46	4944	October 18, 2021
Argus errors on some boots Jetson TX2 camera , gstreamer	8	957	March 28, 2023
Nvargus crashes with unreliable CSI camera connections on Jetpack 5.1.2 Jetson Orin NX camera , nvbugs	15	1824	December 14, 2023
Nvargus-daemon crashes when 4 camera 4k@60 capture pipeline stops on AGX Xavier JP4.5.1 Jetson Xavier NX camera , gstreamer , nvbugs	11	1423	October 18, 2021
Video stream crashes in jetpack 5.1.1 xavier nx Jetson Xavier NX camera	15	907	December 25, 2023
Some basic problems with nvarguscamerasrc plugin for gstreamer Jetson AGX Xavier camera	15	1770	January 26, 2024
AGX Orin: nvargus-daemon: Error InvalidState Jetson AGX Orin camera	13	1358	February 9, 2023
Nvargus-daemon crashing during recording Jetson TX2 camera , gstreamer	18	195	December 17, 2024
Argus stability on Orin NX Jetson Orin NX camera , gstreamer	28	1328	August 28, 2023

JP5.1 nvarguscamera doesn't recover from single NVCSI failure

Related topics