Regression with CSI capture reliability in JetPack 6 RTCPU firmware

I am developing a product that utilises the Orin NX module. One of the camera inputs in to the SOM is a 2160p30 input that connects to the Orin NX using an LT6911UXC. It is then connected to CSI#2 as 4 lanes. The CSI clock from the LT6911UXC has been been measured at 648MHz. The firmware in the LT6911 was supplied by Lontium.

The camera stream is captured using the gstreamer plugin nvv4l2camerasrc. The product does not use argus.

The product was working fine with Jetpack 5.1, but having upgraded to Jetpack 6.1, I found that the camera capture would start Ok but then intermittently fail after approx 10 seconds, although sometimes it can run approx 30 seconds before failing. When the pipeline stops there is often a visible break up in the camera image just before it fails. There is some variation between our hardware, with some products being less reliable than others. All hardware worked with Jetpack 5.1.

I have subsequently upgraded to Jetpack 6.2, but the same issue persists.

The kernel log shows the following error message repeatedly

[ 4242.911182] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms

Sometimes I also get the following messages

[ 3645.556476] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 0, flags: 0, err_data 131072
[ 3645.762939] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 64

In an attempt to track down the source of the problem, I have been trying different RTCPU firmware images. I have tried the debug build for Jetpack 6.1 that was posted by @ShaneCCC. This still shows the problem, so I have attached the log files for it failing.

kernel_trace.log (5.7 MB)
kernel_dmesg.log (8.1 KB)

As a more radical experiment, I took the RTCPU firmware from Jetpack 5.1.4 (L4T 35.6.0) and loaded it onto my Jetpack 6.2 build. In this scenario, the camera works, and so far I have not seen it fail, which seems to indicate the issue relates to the RTCPU firmware in Jetpack 6.

Do the log files indicate any source of the problem, or do you need any more info?

Do you boost the clocks to try?

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate


From below log, looks like lost the FE package. Could you confirm with RTCPU firmware from Jetpack 5.1.4 to check if have below message if boost the clocks didn’t help.

     kworker/1:3-459     [001] .......   522.622978: rtcpu_vinotify_event: tstamp:17108042188 cch:0 vi:1 tag:CSIMUX_FRAME channel:0x00 frame:2 vi_tstamp:547456806240 data:0x00000003000000a2
     kworker/1:3-459     [001] .......   522.622979: rtcpu_vinotify_event: tstamp:17108042335 cch:0 vi:1 tag:CHANSEL_SHORT_FRAME 

Boosting the clocks does appear to the resolve the problem. With a bit of experimentation, it seems that only the nvcsi clock needs to be increased. I also found I didn’t need to run at max_rate. I could run at 183685714, which is approx double the automatically selected rate of 85720000.

I ran a check with the R35.6.0 RTCPU firmware, and the clock rate is also 85720000, but doesn’t have a problem. The trace log didn’t show any CHANSEL_SHORT_FRAME tags, but I think I may need a debug version of the firmware as it doesn’t show any rtcpu_vinotify_event messages.

Suppose you can adjust the pix_clk_hz to allocate more bandwidth for NVCSI/VI for this problem.
But need to careful the output data rate > 15.Gbps

Skew calibration is required if sensor or deserializer is using DPHY, and the output data rate is > 1.5Gbps.
An initiation deskew signal should be sent by sensor or deserializer to perform the skew calibration. If the deskew signals is not sent, the receiver will stall, and the capture will time out.
You can calculate the output data rate with the following equation:

Output data rate = (sensor or deserializer pixel clock in hertz) * (bits per pixel) / (number of CSI lanes)