AGX Orin camera issues

jetson-developer · May 21, 2024, 12:53pm

Hi,

I’m debugging camera issues on AGX Orin 32GB with Jetpack 5.1.3 on a custom mainboard. We have a few Orin modules out of dozens on which the camera regularly stops working and cannot recover, shows usually one of the following traces depending on the module in use

[ 4332.591788] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 131072
[ 4332.616797] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
[ 4332.641986] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 131072
[ 4332.666860] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
with ftrace log showing
kworker/2:2-1458 [002] … 2419.364993: rtcpu_nvcsi_intr: tstamp:76386848248 class:GLOBAL type:STREAM_NOVC phy:0 cil:0 st:2 vc:0 status:0x00000001
kworker/2:2-1458 [002] … 2419.364993: rtcpu_nvcsi_intr: tstamp:76386848248 class:GLOBAL type:STREAM_VC phy:0 cil:0 st:2 vc:0 status:0x00000006
kworker/2:2-1458 [002] … 2419.364994: rtcpu_nvcsi_intr: tstamp:76386848248 class:CORRECTABLE_ERR type:STREAM_NOVC phy:0 cil:0 st:2 vc:0 status:0x00000001
kworker/2:2-1458 [002] … 2419.364994: rtcpu_nvcsi_intr: tstamp:76386848248 class:CORRECTABLE_ERR type:STREAM_VC phy:0 cil:0 st:2 vc:0 status:0x00000006

or

[ 290.338933] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] “General error queue is out of sync with frame queue. ts=314796051360 sof_ts=314796213952 gerror_code=2 gerror_data=600064 notify_bits=30000”
[ 292.956415] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[ 292.957735] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[ 292.962227] (NULL device *): vi_capture_control_message: NULL VI channel received
[ 292.962451] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=4, csi_port=4
[ 292.962757] (NULL device *): vi_capture_control_message: NULL VI channel received
[ 292.963268] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel

or

[ 344.371620] [RCE] VM0 deactivating.VM0 activating.VM0 deactivating.VM0 activating.BUG: core/watchdog/heartbeat-task.c:162 [heartbeat_halt_execution] “*** RCE WATCHDOG FAILURE: HALTING ***”
[ 344.381599] tegra186-cam-rtcpu bc00000.rtcpu: Alert: Camera RTCPU gone bad! restoring it immediately!!
[ 346.949656] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[ 346.950744] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[ 346.955222] (NULL device *): vi_capture_control_message: NULL VI channel received
[ 346.955445] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[ 346.955737] (NULL device *): vi_capture_control_message: NULL VI channel received

or

[ 350.339953] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[ 351.363445] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[ 351.363712] tegra194-vi5 13e40000.host1x:vi1@14c00000: csi_stream_release: failed to disable nvcsi tpg on stream 2 virtual channel 0
[ 352.386938] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[ 352.387198] tegra194-vi5 13e40000.host1x:vi1@14c00000: vi_capture_release: release channel IVC failed

We have tried some of the tricks mentioned on other related threads, e.g. boosting Jetson and locking the vi and nvcsi (we are not using jetson’s isp) clocks, increasing pix_clk (our sensor setup doesn’t have a serializer-deserializer), and changing timings and delays on our sensor driver. Issues seems to be tied to specific Orin modules as replacing a module on the mainboard with a known working one fixes aforementioned issues, this would suggest that the mainboard is ok. The faults can be reproduced systematically.

Any ideas or suggestions on how to proceed will be appreciated

JerryChang · May 22, 2024, 5:49am

hello jetson-developer,

according to below…

may I know what’s the difference, or SKUs of those working/non-working modules.

jetson-developer · May 22, 2024, 8:58am

Hello JerryChang,

At least two of the faulty modules have product part number 699-13701-0004-500 P.0 from EEPROM, order sheet shows part number 900-13701-0040-000. We also have working units with the same SKU and revision + working units from revisions G.0 and A.0.

JerryChang · May 23, 2024, 1:59am

hello jetson-developer,

may I have more details about the camera module you’re using.
is it a YUV camera sensor? since you’re not using Jetson’s ISP, neither working with a SerDes chip.

jetson-developer · May 23, 2024, 8:07am

We are using a custom cameraboard outputting 10-bit RAW from ~12MP grayscale sensor at ~40 fps over D-PHY 1.1, multiple cameras connected to a single module with a custom FPC, The cameraboards have also been crosstested with known working Orins, cameras work as expected with those.

JerryChang · May 23, 2024, 8:56am

hello jetson-developer,

so, it always the specific Orin modules cannot enable camera streaming.
is such issue related to multi-cam as well?

jetson-developer · May 23, 2024, 9:49am

Doesn’t seem to be related to having multiple cameras on our system, our application uses regularly two simultaneously streaming cameras and I cannot recall these kinds of issues affecting more than one camera per module, moreover the errors aren’t transferred with a problematic camera if swapped with the device’s other cameras.

Typically cameras can be initialized and enabled as per usual and streaming works fine for some time. The problems start a bit later, especially NULL VI channel received errors seems to appear more or less randomly after a while somewhere on our application’s streaming sequence. Depending on the module NULL VI channel errors may end up in kernel panic crashing the device, this cannot be reproduced consistently on all of the modules though. tegra-camrtc-capture-vi tegra-capture-vi: corr_err error log can be reproduced reliably on one of the modules and at least on that module the issue have been isolated to failure in starting a stream on one of the cameras from which device cannot recover, stopping the stream and retrying or even reinitializing the sensor doesn’t seem to help.

jetson-developer · May 29, 2024, 9:21am

Any ideas on how to proceed?

JerryChang · May 30, 2024, 1:37am

hello jetson-developer,

is it a DPHY or CPHY sensor?
may I know what’s the data-rate it’s running with? please also evaluate whether it’s approaching the ISP throughput.

jetson-developer · May 30, 2024, 9:53am

Hi JerryChang,

Our sensor boards output D-PHY over four lanes on separate bricks with datarate of ~1.3Gbps per lane (well below max 2.5Gbps or even 1.5Gbps per lane so should’t require descew calibration). Maximum concurrent thoughput directly via VI (we are not using ISP) would be around 10Gbps max for two camera use case, won’t exceed 20Gbps at any point. Failures have been recorded while using only one camera streaming at ~5Gbps total throughput.

JerryChang · May 31, 2024, 3:12am

please try apply pre-built update from Topic 284939 to enable infinite timeout property. let’s check whether it helps with your issues.
you may see-also developer guide to enable Infinite Timeout Support.

jetson-developer · June 2, 2024, 7:23pm

Our application utilizes VI directly via V4L2, we have tried increasing CAPTURE_TIMEOUT_MS on vi5_fops and also setting timeout to be infinite, but that does not seem to help.

JerryChang · June 3, 2024, 4:22am

hello jetson-developer,

let’s dig into low-level driver for more details.
please follow below steps to enable VI tracing logs.

echo 1 > /sys/kernel/debug/tracing/tracing_on
echo 30720 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/tegra_rtcpu/enable
echo 1 > /sys/kernel/debug/tracing/events/freertos/enable
echo 2 > /sys/kernel/debug/camrtc/log-level
echo > /sys/kernel/debug/tracing/trace
cat /sys/kernel/debug/tracing/trace

jetson-developer · June 3, 2024, 8:47am

Hi,

Traces from RCE WATCHDOG FAILURE module attached. Tested also setting timeout to infinity as dmesg shows request timeout with this module, in this case device was left hanging as would be expected based on our earlier tests.

rce_trace.log (1.0 MB)
rce_trace_full.log (9.4 MB)

JerryChang · June 3, 2024, 9:08am

hello jetson-developer,

please give it a try with the kernel patch from Topic 258971 for adding semaphore.

jetson-developer · June 3, 2024, 9:55am

Semaphore patch doesn’t seem to work for our problem as I’m still getting

[ 298.849263] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
[ 298.999499] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 131072
[ 299.049497] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 131072
[ 299.124488] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
[ 299.149592] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
[ 299.249743] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
[ 299.274764] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 131072
[ 299.299883] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 131072
[ 299.324793] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 131072

constantly after streaming for a while

jetson-developer · June 3, 2024, 4:48pm

Tested the semaphore patch on another faulty device, no luck with that either.

[  345.421109] tegra186-cam-rtcpu bc00000.rtcpu: Alert: Camera RTCPU gone bad! restoring it immediately!!
[  345.423320] [RCE] VM0 deactivating.VM0 activating.VM0 deactivating.VM0 activating.BUG: core/watchdog/heartbeat-task.c:162 [heartbeat_halt_execution] "*** RCE WATCHDOG FAILURE: HALTING ***"

JerryChang · June 4, 2024, 2:12am

may I also confirm which CSI brick you’re used?

jetson-developer · June 4, 2024, 3:00pm

We had seen issues on SCILs 0 to 2 (CSI 0 to 5 / AB, CD and EF). SCIL 1 problem can be easily replicated on one of the modules, the issues does not seem to affect one brick over others.

I tested the semaphore fix on third faulty module and got

[ 1883.968405] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 4194404
[ 1883.993429] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 4194404
[ 1884.018468] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 4194404
[ 1884.043486] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 2, flags: 0, err_data 4194404

ViNotifyErrorTag on camrtc_capture.h would give

CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_STREAM_FIFO_OVERFLOW
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_RESERVED_1
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_PXL_ENABLE_FAULT
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_FS_FAULT

instead of err_data 131072

CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_CSI_FAULT_PD_CRC_ERR

on two other modules.

Am I reading the mask correctly?

JerryChang · June 5, 2024, 6:08am

hello jetson-developer,

the err_data content it depends on the value of CaptureStatusCodes.
please see-also $public_sources/kernel_src/kernel/nvidia/include/soc/tegra/camrtc-capture.h for capture_status.
so, err_data=4194404 it represent below…
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_PXL_ENABLE_FAULT
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_RESERVED_1
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_STREAM_FIFO_OVERFLOW

Topic		Replies	Views
AGX Orin 100% CPU usage when multiple video devices are enabled at the same time, one of the camera devices stops streaming data unexpectedly Jetson AGX Orin camera	30	1735	October 3, 2023
Jetson AGX Orin camera capture error Jetson AGX Orin camera , gstreamer	15	2061	August 10, 2022
AGX Orin 32GB platform, a decoding chip includes 4 camera channels, and there are abnormal MIPI outputs Jetson AGX Orin camera , device-tree	8	516	March 19, 2024
如何修改才能将R35.1.0版本刷机刷入64G的ORin模组？ Jetson AGX Orin boot , chinese	11	993	November 20, 2023
/dev/video0 node not appearing when loading custom driver to capture pre-configured streaming camera data without i2c Jetson AGX Orin camera	29	3247	October 14, 2022
Update rce firmware without flash Jetson AGX Orin camera , sensor	15	110	September 18, 2024
Orin camera Fail to parse port info by JetPack5.0.2 Jetson AGX Orin camera , kernel , chinese	42	2622	December 20, 2022
No frame from IMX415 on orin devkit Jetson AGX Orin camera	3	511	November 6, 2023
Orin camera capture error [RCE] ISR PHY 0 CIL_B 0x40 Jetson AGX Orin camera , kernel	25	958	August 1, 2023
Orin mutil camera Jetson AGX Orin camera , kernel	19	753	July 21, 2023

AGX Orin camera issues

Related topics