Intermittent camera failure on Orin

Hi,
This is a follow up to

I reproduced camera error with off-the-shelf LI-JETSON-IMX274-DUAL sensors attached to Orin as well as our Omnivision cameras.
For IMX274 sensors I am using driver and DTB provided for Nvidia.

The error happened approximately once per 50 hours for IMX274 - always with the second camera.
But with our Omnivision camera it happens more often: approximately once per 10 hours with one Orin/cameras set and once per 1 hour on another.
Bottom line is that camera always fails and cannot be restarted without restarting nvargus-daemon.

I tried the clock boosting suggestion from the link above, which does not make it better.

I also tried Tips for Debugging from Jetson/l4t/Camera BringUp - eLinux.org :
I enabled trace:
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo 30720 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/tegra_rtcpu/enable
echo 1 > /sys/kernel/debug/tracing/events/freertos/enable
echo 2 > /sys/kernel/debug/camrtc/log-level
echo 1 > /sys/kernel/debug/tracing/events/camera_common/enable
echo > /sys/kernel/debug/tracing/trace
cat /sys/kernel/debug/tracing/trace

and restarted nvargus-daemon with debug log:
killall nvargus-daemon
export enableCamPclLogs=5
export enableCamScfLogs=5
/usr/sbin/nvargus-daemon

I saw different errors in the trace, such as:
rtcpu_vinotify_error: tstamp:171639231834 cch:2 vi:1 tag:CHANSEL_NOMATCH channel:0x04 frame:0 vi_tstamp:5492455394816 data:0x0000000000000249
and
rtcpu_vinotify_error: tstamp:99974825835 cch:2 vi:1 tag:CSIMUX_FRAME channel:0xac frame:56570 vi_tstamp:3199194215744 data:0x0000000000000402
and
rtcpu_vinotify_error: tstamp:65471898117 cch:2 vi:1 tag:CSIMUX_FRAME channel:0x00 frame:55908 vi_tstamp:2095100601248 data:0x0000da6700000222
and
rtcpu_vinotify_error: tstamp:65471954857 cch:-1 vi:1 tag:CSIMUX_STREAM channel:0x00 frame:0 vi_tstamp:2095102433600 data:0x0000000000000100

And different errors from nvargus-daemon:
SCF: Error BadValue: NvPHSSendThroughputHints (in src/common/CameraPowerHint.cpp, function sendCameraPowerHint(), line 56)
SCF: Error Timeout: (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
SCF: Error Timeout: (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
SCF: Error Timeout: (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 979)
SCF: Error Timeout: (propagating from src/components/CaptureContainerImpl.cpp, function assignAllBuffersFromStream(), line 241)
SCF: Error Timeout: (propagating from src/components/stages/CCDataSetupStage.cpp, function doHandleRequest(), line 68)
SCF: Error Timeout: (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
(Argus) Error OverFlow: Too many pending events, ignoring new events (in src/api/EventProviderImpl.cpp, function addEvent(), line 158)
And various others including
nvargus-daemon[1016]: Module_id 30 Severity 2 : (fusa) Error: InvalidState Status syncpoint signaled but status value not updated in:/capture/src/fusaViHandler.cpp 817
which I mentioned in my previous port.

It looks like some kind of corruption in nvargus or real time engine, which causes random errors to pop up.
How should I troubleshoot this further?

cat /etc/nv_tegra_release

R35 (release), REVISION: 1.0, GCID: 31346300, BOARD: t186ref, EABI: aarch64, DATE: Thu Aug 25 18:41:45 UTC 2022

Thank you

hello jhnlmn,

since you’re running with LI-JETSON-IMX274-DUAL, we had the same reference camera board.
may I know what’s the detail repo steps. we would like to reproduce the same on developer kit.

We have an app, which is based on 13_multi_camera, but have few additions for image processing using OpenCV and Cuda. We run at 60 fps.

Usually I am running 2 hour test: boot, start app, after 2 hours collect logs and reboots.
But once I simply left and app running for few days and it failed after 62 hours with message fusaViHandler.cpp 817 - this was with IMX274 - it fails pretty rarely, may be once in few days - it will take you a long time to reproduce it.
But our own camera fails more often, I have one orin/cameras setup which fails once per 1 or 2 hours. So, if you have some debug code/settings, may be you will give it to me?
Thank you

hello jhnlmn,

this is bad news…

let’s keep it simple, could you please narrow down the issue on your setup.
does it happened when running without OpenCV/CUDA processing?

Yes. I had seen errors when extra processing was disabled.
I will try to reproduce the error with the original 13_multi_camera, but it may take long time.
Meanwhile, can you give me some extra debugging code or settings, like a debug version of rtcpu FW, which you gave to other people in the past?

Also, we are worried that camera errors are always fatal and camera cannot be resumed without restarting nvargus-daemon.
We need a more robust system, which will be able to recover from intermittent camera glitches like an occasional corrupted frame.
Can argus be reconfigured to ignore intermittent errors?

hello jhnlmn,

I cannot share extra debugging codes since it’s not clearly which paragraph cause the failure.
however, you may see-also Topic 226574 for the JetPack-5.0.2/l4t-r35.1 camera firmware with debug flag enabled. you could have partition update to re-flash rce-fw binary individually.

let me double check the issue.

  1. you’ve kept dual camera streams running, (i.e. without restart camera app repeatedly) and you see failure after a long while.
  2. after the issue happened, you can only restore the camera functionality by restarting nvargus-daemon service.

FYI,
please see-also Argus sample code, userAutoExposure.
there’s error handling mechanism. it’s EVENT_TYPE_ERROR flag to handle error condition.

Hi,
After installing the debug FW from Topic 226574 (of size 538016 bytes from Aug 22 2022) a week ago I had not seen any errors,
which used to happen with the stock FW from Jetson_Linux_R35.1.0_aarch64.tbz2 (which had size 529760 from Aug 10 2022).

Question is whether the debug FW from Topic 226574 had any bugs fixed besides having debug flag enabled?
Will these bug fixes be included in the upcoming new Jetpack release? Is there a release date?

Thank you

hello jhnlmn,

that debug firmware is identical with the native camera firmware file besides debug flag enabled.
since this has rare failure rate with reference camera module. it might be the sensor timing issue, the debug binary will output lots of logs to linux kernel.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.