Streaming got struck while using GMSL2 Serdes setup with four AR0521 cameras on Xavier NX

Hi Team ,
Currently we have four AR0521 cameras connected on Jetson Xavier NX on our custom board with MAX96714(Deserializer) and MAX96717(Serializer) using GMSL2 protocol.

Our Observations
a. The Serdes setup is working fine when one , two cameras were connected .
b.All four cameras worked fine when they are connected directly to our board .
c.Tried applying the below patches Capture-ivc: fix multi-cam race condition when more than two cameras are attached which resulted in cameras hang/struck after certain time with the below error in dmesg.

IVC Capture failed..(Similar to this)

[ 1464.525706] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[ 1464.534948] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel

d.Replaced camera-rtcpu-t194-rce.img as well which we received through Nvbuf_utils: dmabuf_fd -1 mapped entry NOT found in Jetson Xavier NX - #15 by spavan
Versions:
Jetpack 35.4.1

Could you support and guide us if any other areas need to be explored from our side? @ShaneCCC

can you share camera-rtcpu-t194-rce.img similar to the 234 version in this ticket?

Have you apply below changes.

[RCE] General error queue is out of sync with frame queue. Nvbuf_utils: dmabuf_fd -1 mapped entry NOT found in Jetson Xavier NX - #14 by ShaneCCC

[vi5] kernel panic while v4l2-ctl capture timeout. Capture-ivc: fix multi-cam race condition

Thank you for the reply @ShaneCCC .
Yes , i applied these patches.
But I am not sure about the firmware which is being discussed in this link

Can you share the camera-rtcpu-t194-rce.img that resembles the image ( camera-rtcpu-t234-rce.img.r35.5.dbg.idac (519.4 KB) ) of 35.4.1 version.

Using below for Xavier(t194)

I am already using this firmware in the current build.

Without GMSL2 working well?
Without problem before apply the patch?

yes , it worked well without GMSL2 after applying the patch.

currently the cameras are running at 24 fps , so will reducing it to 20 fps help in solving this error? @ShaneCCC

or any other thoughts from your side.

Have you boost the clocks to try.

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

okay,

  1. what about reduction of fps? I am thinking to check at 20 fps from 24 fps , will this impact the stability?
  2. What actually do these boosting the clocks do ?

I boosted the clocks and reduced fps from 24 to 20 . The below error is seen in the dmesg logs

[ 1427.950565] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[ 1427.950803] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[ 1427.952217] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
[ 1433.069384] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[ 1433.069633] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel

Boost clocks to confirm if NVCSI/VI bandwidth cause the problem.
Looks like doesn’t matter with the bandwidth. How about 3 cameras.
Maybe get the trace log if more clue.

Three cameras worked fine for 42 hours , last two days.

There is occurrence of frame loss , i observed from the GST_DEBUG as below

0:00:00.512188811  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:914:gst_v4l2src_create:<v4l2src0> Timestamp does not correlate with any clock, ignoring driver timestamps
0:12:06.946709029  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709540186 - ts: 0:12:06.760006073
0:12:12.071167384  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:11.884256893
0:12:17.186743301  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:17.000101928
0:12:22.310791899  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:22.124147869
0:12:27.430831922  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:27.244190196
0:12:32.546990087  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:32.360277657
0:12:37.667045282  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:37.480401572
0:12:42.786917811  3934 0xaaab07a056a0 WARN                 v4l2src gstv4l2src.c:978:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 18446744073709551613 - ts: 0:12:42.600274005

these are observed from the trace log

11514.570643: rtcpu_vinotify_error: tstamp:360714586021 cch:3 vi:0 tag:CSIMUX_FRAME channel:0x00 frame:0 vi_tstamp:11542866315840 data:0x00000000000000a0

11318.038631: rtcpu_vinotify_error: tstamp:354573653481 cch:-1 vi:0 tag:CSIMUX_STREAM channel:0x10 frame:0 vi_tstamp:11346356757248 data:0x0000000000000100`Preformatted text`

The bit 5 and 7 indicate lose the FE package cause the FS_FAULT

11514.570643: rtcpu_vinotify_error: tstamp:360714586021 cch:3 vi:0 tag:CSIMUX_FRAME channel:0x00 frame:0 vi_tstamp:11542866315840 data:0x00000000000000a0

okay, can you suggest any changes from your side to overcome this error?
or any thoughts to explore?

four_cam1.zip (19.7 KB)
Go the below error on four cameras running at 20 fps after running for 3 hours.

[15004.761299] [RCE] BUG: core/watchdog/heartbeat-task.c:162 [heartbeat_halt_execution] "*** RCE WATCHDOG FAILURE: HALTING ***"
[15004.763946] tegra186-cam-rtcpu bc00000.rtcpu: Alert: Camera RTCPU gone bad! restoring it immediately!!
[15009.761334] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[15009.761578] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[15009.762020] tegra194-vi5 15c10000.vi: vi_capture_release: control failed, errno 1
[15009.762346] video4linux video2: vi capture release failed
[15009.762471] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
[15009.765313] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[15009.765433] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[15009.765630] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 5000 ms
[15009.765638] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[15009.766209] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[15009.766477] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[15009.767031] tegra194-vi5 15c10000.vi: vi_capture_release: control failed, errno 1
[15009.767157] video4linux video0: vi capture release failed
[15009.776596] tegra194-vi5 15c10000.vi: vi_capture_release: control failed, errno 1
[15009.776605] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
[15009.776624] tegra194-vi5 15c10000.vi: vi_capture_release: control failed, errno 1
[15009.793098] video4linux video1: vi capture release failed
[15009.806212] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
[15009.822277] video4linux video3: vi capture release failed
[15009.822518] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed

I have attached the dmesg logs as well.

@ShaneCCC , I have added the mode in the dtb of the camera with 2592x1944 resolution , will this addition might help to solve the issue?

And any idea on the above error when 4 cameras are used?

Maybe but I can’t tell.

okay , do you have any other ideas to solve this ?