Argus capture issue with TX2 NX and FPD-Link III

Hi,

we are facing some issues related to image capturing with 2 cameras on TX2 NX.

Setup:

  • JetPack 4.6.4
  • L4T 32.7.4
  • 2x OV9782 cameras at 20fps connected via de-/serializer (ds90ub954 / ds90ub953)
  • deserializer is connected with 4 lanes running at 1.6Gbps

The device-tree mode for the cameras looks like this:

#camera 0
mode0 {
    tegra_sinterface = "serial_a";
    vc_id = "1"; 
    num_lanes = "4";

    active_w = "1280";
    active_h = "800";
    cil_settletime = "0";
    csi_pixel_bit_depth = "10";
    discontinuous_clk = "no";
    dpcm_enable = "false";
    dynamic_pixel_bit_depth = "10";
    embedded_metadata_height = "0";
    inherent_gain = "1";
    line_length = "1295";
    max_exp_time = "10000";
    max_framerate = "20";
    default_framerate = "20";
    default_gain = "1.0"; /* 0x3509 default = 0x10 */
    max_gain_val = "15.9375"; /* 255/16 */
    max_hdr_ratio = "1";
    mclk_khz = "24000";
    mclk_multiplier = "170.0";
    min_exp_time = "9";
    min_framerate = "1";
    min_gain_val = "1.0";
    min_hdr_ratio = "1.0";
    mode_type = "bayer";
    pix_clk_hz = "160000000";
    pixel_phase = "bggr";
    readout_orientation = "0";
    serdes_pix_clk_hz = "4000000000";
    phy_mode = "DPHY";
};
#camera 1
mode0 {
    tegra_sinterface = "serial_a";
    vc_id = "0"; 
    num_lanes = "4";

    active_w = "1280";
    active_h = "800";
    cil_settletime = "0";
    csi_pixel_bit_depth = "10";
    discontinuous_clk = "no";
    dpcm_enable = "false";
    dynamic_pixel_bit_depth = "10";
    embedded_metadata_height = "0";
    inherent_gain = "1";
    line_length = "1295";
    max_exp_time = "10000";
    max_framerate = "20";
    default_framerate = "20";
    default_gain = "1.0"; /* 0x3509 default = 0x10 */
    max_gain_val = "15.9375"; /* 255/16 */
    max_hdr_ratio = "1";
    mclk_khz = "24000";
    mclk_multiplier = "170.0";
    min_exp_time = "9";
    min_framerate = "1";
    min_gain_val = "1.0";
    min_hdr_ratio = "1.0";
    mode_type = "bayer";
    pix_clk_hz = "160000000";
    pixel_phase = "bggr";
    readout_orientation = "0";
    serdes_pix_clk_hz = "4000000000";
    phy_mode = "DPHY";
};

The issue we are seeing now, is appearing randomly while streaming data with our own application using libargus and nvargus-daemon.

nvargus-daemon[3896]: CAM: serial no file already exists, skips storing again=== gst-launch-1.0[4214]: CameraProvider initialized (0x7f70c22c80)CAM: serial no file already exists, skips storing againCAM: serial no file already exists, skips storing againCAM: serial no file already exists, skips storin>
nvargus-daemon[3896]: NvViErrorDecode Stream 0.1 failed: ts 35068306656 frame 151 error 2 data 0x000000a0
nvargus-daemon[3896]: NvViErrorDecode CaptureError: CsimuxFrameError (2)
nvargus-daemon[3896]: NvViErrorDecode See https://wiki.nvidia.com/wmpwiki/index.php/Camera_Debugging/CaptureError_debugging for more information and links to documents.
nvargus-daemon[3896]: CsimuxFrameError_Regular : 0x000000a0
nvargus-daemon[3896]:     Stream ID                [ 2: 0]: 0
nvargus-daemon[3896]:         
nvargus-daemon[3896]:     VPR state from fuse block    [ 3]: 0
nvargus-daemon[3896]:         
nvargus-daemon[3896]:     Frame end (FE)              [ 5]: 1
nvargus-daemon[3896]:         A frame end has been found on a regular mode stream.
nvargus-daemon[3896]:     FS_FAULT                    [ 7]: 1
nvargus-daemon[3896]:         A FS packet was found for a virtual channel that was already in frame. An errored FE packet was injected before FS was allowed through.
nvargus-daemon[3896]: captureErrorCallback Stream 0.1 capture 149 failed: ts 35068306656 frame 151 error 2 data 0x000000a0
nvargus-daemon[3896]: Error: waitCsiFrameEnd timeout guid 1
nvargus-daemon[3896]: VI Stream Id = 0 Virtual Channel = 1
nvargus-daemon[3896]: ************VI Debug Registers**********
nvargus-daemon[3896]: VI_CSIMUX_STAT_FRAME_1         = 0x000000b4
nvargus-daemon[3896]: VI_CSIMUX_FRAME_STATUS_0         = 0x00000000
nvargus-daemon[3896]: VI_CFG_INTERRUPT_STATUS_0         = 0x3f000000
nvargus-daemon[3896]: VI_ISPBUFA_ERROR_0         = 0x00000000
nvargus-daemon[3896]: VI_FMLITE_ERROR_0         = 0x00000000
nvargus-daemon[3896]: VI_NOTIFY_ERROR_0         = 0x00000000
nvargus-daemon[3896]: *****************************************
nvargus-daemon[3896]: CSI Stream Id = 0 Brick Id = 0
nvargus-daemon[3896]: ************CSI Debug Registers**********
nvargus-daemon[3896]: CILA_INTR_STATUS_CILA[0x10400]         = 0x08000000
nvargus-daemon[3896]: CILB_INTR_STATUS_CILB[0x10c00]         = 0x00000000
nvargus-daemon[3896]: INTR_STATUS[0x100a4]         = 0x00010000
nvargus-daemon[3896]: INTR_STATUS[0x100a4]         = 0x00010000
nvargus-daemon[3896]: ERR_INTR_STATUS[0x100ac]         = 0x00010000
nvargus-daemon[3896]: ERROR_STATUS2VI_VC0[0x10094]         = 0x00000000
nvargus-daemon[3896]: ERROR_STATUS2VI_VC1[0x10098]         = 0x00000000
nvargus-daemon[3896]: ERROR_STATUS2VI_VC2[0x1009c]         = 0x00000000
nvargus-daemon[3896]: ERROR_STATUS2VI_VC3[0x100a0]         = 0x00000000
nvargus-daemon[3896]: *****************************************
nvargus-daemon[3896]: SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 998)
nvargus-daemon[3896]: SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
nvargus-daemon[3896]: SCF: Error Timeout: Worker thread ViCsiHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
kernel: fence timeout on [ffffffc0b4d38240] after 1500ms

It is also possible to reproduce it using gstreamer:

gst-launch-1.0 nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1280, height=800, framerate=20/1, format=NV12' ! fakesink nvarguscamerasrc sensor-id=1 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=1280, height=800, framerate=20/1, format=NV12' ! fakesink

From my tests, most of the time the issue appears in the first hour after starting the streaming.
But it can also appear right at the beginning or might take up to 4 hours to appear.

Once the error appears with gstreamer, the pipeline will stop and the nvargus-daemon will either segfault, or has to be restarted.

I have also tried boosting the capture clocks, but that does not make any difference:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

So I’m wondering how we can solve the underlying issue, that causes the error to appear.
If that can’t be solved, we need a proper solution that does not require restarting the nvargus-daemon, to keep the downtime as small as possible.

1 Like

It could be the FPD link send FS during the frame time. Check if able bypass FPD like to confirm it.

nvargus-daemon[3896]: CsimuxFrameError_Regular : 0x000000a0
nvargus-daemon[3896]:     Stream ID                [ 2: 0]: 0
nvargus-daemon[3896]:         
nvargus-daemon[3896]:     VPR state from fuse block    [ 3]: 0
nvargus-daemon[3896]:         
nvargus-daemon[3896]:     Frame end (FE)              [ 5]: 1
nvargus-daemon[3896]:         A frame end has been found on a regular mode stream.
nvargus-daemon[3896]:     FS_FAULT                    [ 7]: 1
nvargus-daemon[3896]:         A FS packet was found for a virtual channel that was already in frame. An errored FE packet was injected before FS was allowed through.

Hej @ShaneCCC thank you very much for your suggestion. We do have our custom made hardware. It is not possible at the moment to bypass the FPD-Link. Is there any possibility to debug this further?

No, you need to confirm the FPD-Link sensor correct data to Jetson.
Maybe enable the FPD-Link test pattern to confirm the FPD-Link side.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.