About tegra-camrtc-capture-vi and rtcpu_nvcsi_intr errors on capture timeout

Hello. We are getting a tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms error on a xavier nx with custom board and wanted to ask some question about rtcpu_nvcsi_intr errors. Our sensor is a ov64b in jetpack 5.1.5 with a external provided mclk.

Clock are boosted with:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate > /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate >  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate > /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate

after capture with v4l2-ctl --device=/dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --stream-mmap=0 --stream-count=1 --stream-to=frame.raw --verbose in traces we found the following errors on repeat:

trace
   v4l2-ctl-1047    [002] ....    45.686858: tegra_channel_open: vi-output, ov64b 2-0036

        v4l2-ctl-1047    [002] ....    45.693192: tegra_channel_set_power: ov64b 2-0036 : 0x1

        v4l2-ctl-1047    [002] ....    45.693204: camera_common_s_power: status : 0x1

        v4l2-ctl-1047    [002] ....    45.716346: tegra_channel_set_power: 13e10000.host1x:nvcsi@15a00000- : 0x1

        v4l2-ctl-1047    [002] ....    45.716350: csi_s_power: enable : 0x1

        v4l2-ctl-1047    [002] ....    45.716719: tegra_channel_capture_setup: vnc_id 0 W 1920 H 1080 fmt c4

        v4l2-ctl-1047    [001] ....    45.717969: tegra_channel_set_stream: enable : 0x1

        v4l2-ctl-1047    [001] ....    45.719560: tegra_channel_set_stream: 13e10000.host1x:nvcsi@15a00000- : 0x1

        v4l2-ctl-1047    [001] ....    45.719564: csi_s_stream: enable : 0x1

        v4l2-ctl-1047    [001] ....    45.719956: tegra_channel_set_stream: ov64b 2-0036 : 0x1

     kworker/2:2-122     [002] ....    45.736261: rtcpu_vinotify_event: tstamp:1923149562 cch:0 vi:0 tag:VIFALC_TDSTATE channel:0x23 frame:0 vi_tstamp:61530742976 data:0xcd9ce50010000000

     kworker/2:2-122     [002] ....    45.736263: rtcpu_vinotify_event: tstamp:1923149719 cch:0 vi:0 tag:VIFALC_TDSTATE channel:0x23 frame:0 vi_tstamp:61530755456 data:0x0000000031000001

     kworker/2:2-122     [002] ....    45.736264: rtcpu_vinotify_event: tstamp:1923149893 cch:0 vi:0 tag:VIFALC_TDSTATE channel:0x23 frame:0 vi_tstamp:61530810528 data:0xcd9ce20010000000

     kworker/2:2-122     [002] ....    45.736265: rtcpu_vinotify_event: tstamp:1923150045 cch:0 vi:0 tag:VIFALC_TDSTATE channel:0x23 frame:0 vi_tstamp:61530822464 data:0x0000000031000002

     kworker/2:2-122     [002] ....    46.240265: rtcpu_nvcsi_intr: tstamp:1937916976 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000022

     kworker/2:2-122     [002] ....    46.240269: rtcpu_nvcsi_intr: tstamp:1937916976 class:GLOBAL type:PHY_INTR0 phy:1 cil:1 st:0 vc:0 status:0x00000020

     kworker/2:2-122     [002] ....    46.240270: rtcpu_nvcsi_intr: tstamp:1937916976 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:0 st:0 vc:0 status:0x00000132

     kworker/2:2-122     [002] ....    46.240271: rtcpu_nvcsi_intr: tstamp:1937916976 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:1 st:0 vc:0 status:0x00000120

     kworker/2:2-122     [002] ....    46.240272: rtcpu_nvcsi_intr: tstamp:1938976541 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000042

     kworker/2:2-122     [002] ....    46.240272: rtcpu_nvcsi_intr: tstamp:1938976541 class:GLOBAL type:PHY_INTR0 phy:1 cil:1 st:0 vc:0 status:0x00000040

     kworker/2:2-122     [002] ....    46.240273: rtcpu_nvcsi_intr: tstamp:1938976541 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:0 st:0 vc:0 status:0x00000062

     kworker/2:2-122     [002] ....    46.240274: rtcpu_nvcsi_intr: tstamp:1938976541 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:1 st:0 vc:0 status:0x00000040

     kworker/2:2-122     [002] ....    46.240274: rtcpu_nvcsi_intr: tstamp:1938977400 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000022

     kworker/2:2-122     [002] ....    46.240275: rtcpu_nvcsi_intr: tstamp:1938977837 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000044

     kworker/2:2-122     [002] ....    46.240276: rtcpu_nvcsi_intr: tstamp:1938977837 class:GLOBAL type:PHY_INTR0 phy:1 cil:1 st:0 vc:0 status:0x00000060

     kworker/2:2-122     [002] ....    46.240276: rtcpu_nvcsi_intr: tstamp:1938977837 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:0 st:0 vc:0 status:0x00000066

     kworker/2:2-122     [002] ....    46.240277: rtcpu_nvcsi_intr: tstamp:1938977837 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:1 st:0 vc:0 status:0x00000060

     kworker/2:2-122     [002] ....    46.240278: rtcpu_nvcsi_intr: tstamp:1938978683 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000022

     kworker/2:2-122     [002] ....    46.240279: rtcpu_nvcsi_intr: tstamp:1938979132 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000044

     kworker/2:2-122     [002] ....    46.240279: rtcpu_nvcsi_intr: tstamp:1938979132 class:GLOBAL type:PHY_INTR0 phy:1 cil:1 st:0 vc:0 status:0x00000040

     kworker/2:2-122     [002] ....    46.240280: rtcpu_nvcsi_intr: tstamp:1938979132 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:0 st:0 vc:0 status:0x00000046

     kworker/2:2-122     [002] ....    46.240281: rtcpu_nvcsi_intr: tstamp:1938979132 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:1 st:0 vc:0 status:0x00000040

     kworker/2:2-122     [002] ....    46.240281: rtcpu_nvcsi_intr: tstamp:1938979968 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000042

     kworker/2:2-122     [002] ....    46.240282: rtcpu_nvcsi_intr: tstamp:1938980426 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000024

     kworker/2:2-122     [002] ....    46.240283: rtcpu_nvcsi_intr: tstamp:1938980426 class:GLOBAL type:PHY_INTR0 phy:1 cil:1 st:0 vc:0 status:0x00000040

     kworker/2:2-122     [002] ....    46.240283: rtcpu_nvcsi_intr: tstamp:1938980426 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:0 st:0 vc:0 status:0x00000064

     kworker/2:2-122     [002] ....    46.240284: rtcpu_nvcsi_intr: tstamp:1938980426 class:CORRECTABLE_ERR type:PHY_INTR phy:1 cil:1 st:0 vc:0 status:0x00000040

     kworker/2:2-122     [002] ....    46.240285: rtcpu_nvcsi_intr: tstamp:1938981266 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000044

     kworker/2:2-122     [002] ....    46.240286: rtcpu_nvcsi_intr: tstamp:1938981734 class:GLOBAL type:PHY_INTR0 phy:1 cil:0 st:0 vc:0 status:0x00000022

From TRM we decode the errors to:

phy 1, cil 0
============
status: 0x00000022
------------------
bit val  description
  5   1  intr_cil_data_lane_sot_sb_err1_a
  1   1  intr_cil_data_lane_sot_sb_err0_a

status: 0x00000024
------------------
bit val  description
  5   1  intr_cil_data_lane_sot_sb_err1_a
  2   1  intr_cil_data_lane_sot_mb_err0_a

status: 0x00000042
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  1   1  intr_cil_data_lane_sot_sb_err0_a

status: 0x00000044
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  2   1  intr_cil_data_lane_sot_mb_err0_a

status: 0x00000046
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  2   1  intr_cil_data_lane_sot_mb_err0_a
  1   1  intr_cil_data_lane_sot_sb_err0_a

status: 0x00000062
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  5   1  intr_cil_data_lane_sot_sb_err1_a
  1   1  intr_cil_data_lane_sot_sb_err0_a

status: 0x00000064
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  5   1  intr_cil_data_lane_sot_sb_err1_a
  2   1  intr_cil_data_lane_sot_mb_err0_a

status: 0x00000066
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  5   1  intr_cil_data_lane_sot_sb_err1_a
  2   1  intr_cil_data_lane_sot_mb_err0_a
  1   1  intr_cil_data_lane_sot_sb_err0_a

status: 0x00000132
------------------
bit val  description
  8   1  intr_cil_data_lane_rxfifo_full_err1_a
  5   1  intr_cil_data_lane_sot_sb_err1_a
  4   1  intr_cil_data_lane_rxfifo_full_err0_a
  1   1  intr_cil_data_lane_sot_sb_err0_a

phy 1, cil 1
============
status: 0x00000020
------------------
bit val  description
  5   1  intr_cil_data_lane_sot_sb_err1_a

status: 0x00000040
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a

status: 0x00000060
------------------
bit val  description
  6   1  intr_cil_data_lane_sot_mb_err1_a
  5   1  intr_cil_data_lane_sot_sb_err1_a

status: 0x00000120
------------------
bit val  description
  8   1  intr_cil_data_lane_rxfifo_full_err1_a
  5   1  intr_cil_data_lane_sot_sb_err1_a

Our relevant DTB values for the camera mode are:

========= DTB Values ==========
width          1920
heigth         1080
pix_clk_hz     320000000
discont clk    no
line_length    2200
cil_settletime 160
embedded_metadata_height 0
discontinuous_clk no
===============================

And deskew is not enable on sensor.

Configuration mode is 1920x1080 10bit with a datarate of 800mbps from sensor documentation. pix_clk is calculated as 800Mbps * 4 lanes /10= 320000000hz

Logs:

CamTrace.log (5.9 KB)

dmesg.log (2.3 KB)

With that we have the following questions:

  1. Why don’t we always get the intr_cil_data_lane_rxfifo_full_err1_a error?
  2. How the combination of intr_cil_data_lane_sot_sb_err1_a and intr_cil_data_lane_sot_mb_err1_a works? are the error on differents SOT?
  3. How can we access the data obtained during the SOT?
  4. Is it possible to dump the data from the FIFO buffer?
  5. Can NVIDIA provide a debug camera firmware for jetpack 5.1.5.? We see that one is available for 1.5.3 in Making sure you're not a bot!
  1. Why you expected get intr_cil_data_lane_rxfifo_full_err1_a error?
  2. No sure of it. Could be in different lanes.
  3. No possible. But you can dump the nvcsi REG for referencing.
  4. No way to do it.
  5. Check below for debug RCE firmware.
[RCE] JetPack-5.1.4/l4t-r35.6.0 camera firmware with debug flag enabled.
https://forums.developer.nvidia.com/t/320332/11/