Drop Corrupted VI Frames Early in tegra_capture_ivc_recv_msg()

Hi NVIDIA Support Team,

We’re facing intermittent corrupted frames (truncated data, packer overflows, CSI CRC faults, etc.) on JetPack 5.3 with the IMX219 sensor that cause Argus/V4L2 pipelines to have torn image (half image from previous frame). To prevent userspace from ever seeing those bad buffers/frames, we tried dropping them as early as possible in the VI driver.

Specifically, in capture-ivc.c we:

  1. Pull in the exact same status codes and flag bits that the RTCPU firmware sets by including:
#include <soc/tegra/camrtc-capture.h>
  1. In the IVC worker callback, only on the capture channel, we check:
  • If msg->resp is NULL (to avoid a NULL-deref), we return.
  • If status.status != CAPTURE_STATUS_SUCCESS or any of these fatal flags is set:
CAPTURE_STATUS_FLAG_CHANNEL_IN_ERROR
CAPTURE_STATUS_NOTIFY_BIT_ATOMP_FRAME_TRUNCATED
CAPTURE_STATUS_NOTIFY_BIT_ATOMP_FRAME_TOSSED
CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_FS_FAULT

we log a warning and return—never invoking the client callback.

static inline void tegra_capture_ivc_recv_msg(
    struct tegra_capture_ivc            *civc,
    uint32_t                             id,
    const struct tegra_capture_ivc_resp *msg)
{
    struct device                   *dev  = &civc->chan->dev;
    const struct capture_descriptor *desc = msg->resp;

    if (civc == __scivc_capture) {
        if (!desc) {
            dev_warn(dev, "capture-ivc: missing payload for capture ch=%u\n", id);
            return;
        }
        if (desc->status.status != CAPTURE_STATUS_SUCCESS ||
            (desc->status.flags & (
                CAPTURE_STATUS_FLAG_CHANNEL_IN_ERROR            |
                CAPTURE_STATUS_NOTIFY_BIT_ATOMP_FRAME_TRUNCATED |
                CAPTURE_STATUS_NOTIFY_BIT_ATOMP_FRAME_TOSSED    |
                CAPTURE_STATUS_NOTIFY_BIT_CSIMUX_FRAME_FS_FAULT
            ))) {
            dev_warn(dev,
                     "capture-ivc: drop corrupted frame ch=%u status=%u flags=0x%x\n",
                     id,
                     desc->status.status,
                     desc->status.flags);
            return;
        }
    }

    if (civc->cb_ctx[id].cb_func)
        civc->cb_ctx[id].cb_func(msg, civc->cb_ctx[id].priv_context);
}

After rebuilding and flashing the kernel on an Xavier NX, the same GStreamer command:

gst-launch-1.0 nvarguscamerasrc sensor-id=3 ! \
  "video/x-raw(memory:NVMM),width=1640,height=1232,framerate=15/1" ! \
  queue ! nvvidconv ! autovideosink sync=false

This command is used to working correctly without above change.
But now produces a black screen and eventually a kernel panic in the IVC worker (see attached log).

Questions / Requests:

  1. Feasibility: Is it architecturally sound to drop corrupt frames in the VI driver at this level?
  2. Recommendations: What adjustments would you suggest to:
  • Avoid intercepting control-plane messages or essential callbacks?
  • Maintain proper buffer/IVC state so the worker thread doesn’t panic?
  • Or, would you recommend a different hook (e.g. in V4L2 or vb2) for early frame filtering?

Thank you very much for your guidance!

Suppose vi5_fops.c already do it.

			goto uncorr_err;
549  		} else if (descr->status.status != CAPTURE_STATUS_SUCCESS) {
550  			if ((descr->status.flags
551  					& CAPTURE_STATUS_FLAG_CHANNEL_IN_ERROR) != 0) {
552  				chan->queue_error = true;
553  				dev_err(vi->dev, "uncorr_err: flags %d, err_data %d\n",
554  					descr->status.flags, descr->status.err_data);
555  			} else {
556  				dev_warn(vi->dev,
557  					"corr_err: discarding frame %d, flags: %d, "
558  					"err_data %d\n",
559  					descr->status.frame_id, descr->status.flags,
560  					descr->status.err_data);
561  				frame_err = true;
562  			}

Hi Shane,
Thanks for your response.
I noticed that the current logic you posted only handles one type of error flag (CHANNEL_IN_ERROR) bit 1, but it doesn’t account for other potentially serious conditions:
/** @defgroup CaptureStatusFlags Capture status flags /
/
* @{ /
/
* Channel encountered unrecoverable error and must be reset */
define CAPTURE_STATUS_FLAG_CHANNEL_IN_ERROR MK_BIT32(1)

/** Spurious CSI packet seen before SOF (doesn’t necessarily corrupt frame) */
define CAPTURE_STATUS_FLAG_CSIMUX_STREAM_SPURIOUS MK_BIT32(2)

/** CSI FIFO saw a bad packet (may or may not corrupt frame) */
define CAPTURE_STATUS_FLAG_CSIMUX_FIFO_BADPKT MK_BIT32(3)

/ Forced frame end (frame cut-off) /*
define CAPTURE_STATUS_FLAG_CSIMUX_FRAME_FORCE_FE MK_BIT32(4)

/** Single-bit ECC error (corrected in hardware) */
define CAPTURE_STATUS_FLAG_CSIMUX_FRAME_ECC_SBE MK_BIT32(5)

/** CSI protocol fault (CRC/timing) */
define CAPTURE_STATUS_FLAG_CSIMUX_FRAME_CSI_FAULT MK_BIT32(6)

/ No matching channel for some earlier frames (info only) /*
define CAPTURE_STATUS_FLAG_CHANSEL_NO_MATCH MK_BIT32(7)

/** Frame truncated in the ATOMP packer */
define CAPTURE_STATUS_FLAG_ERROR_ATOMP_FRAME_TRUNC MK_BIT32(8)

/** Frame tossed by the ATOMP packer /
define CAPTURE_STATUS_FLAG_ERROR_ATOMP_FRAME_TOSSED MK_BIT32(9)
/
* @} */

Especially bit 2,4,7&8 in our case.
These flags indicate forced frame-end (cut-off) and missing-channel frames, respectively.

Should we extend our drop logic to include these (and any other relevant) flags as well?
Do you recommend include bit2, bit 4 , bit 7 & bit8 as bad/corrupted frame drop conditions in the driver?

Thanks again for your guidance!

Some additional thoughts/notes here:

  • Bit 4 (FORCE_FE) and Bit 8 (FRAME_TRUNC) absolutely indicate cut-off or truncated frames and should be treated as one of drop condition.
  • Bit 2 (STREAM_SPURIOUS) often means extra CSI data before SOF—it can still corrupt the first lines of your image, so dropping on that is reasonable.
  • Bit 7 (CHANSEL_NO_MATCH) tells frames were lost earlier on this channel; if pipeline can’t tolerate those gaps, dropping the current frame keeps things in sync.

Current CHANNEL_IN_ERROR include all uncorrectable error.

No I did not agree with you —CAPTURE_STATUS_SUCCESS is just one of the many status codes the RTCPU firmware can return, All the other codes (CSIMUX_FRAME, CSIMUX_STREAM, CHANNEL_COLLISION, SHORT_FRAME, PACKER_OVERFLOW, FRAME_TRUNCATED, FRAME_TOSSED, etc.) are distinct values in the status.status field.
If you think it cover all error, pls kindly show me the related code and some evidence documentation?

CHANNEL_IN_ERROR is an unrecoverable CSI-channel failure. It mark queue_error and log a dev_err,
In other branch else { is discarding frame} but it just under the condition of " not CAPTURE_STATUS_SUCCESS".
So I think it did not cover more cases for discarding frame.

Unless you show me the doc/evidence bit 1 covers error 2,3,4
define CAPTURE_STATUS_SUCCESS MK_U32(1)
define CAPTURE_STATUS_CSIMUX_FRAME MK_U32(2)
define CAPTURE_STATUS_CSIMUX_STREAM MK_U32(3)
define CAPTURE_STATUS_CHANSEL_FAULT MK_U32(4)

Logically, if our check truly catches every non-SUCCESS status, no corrupted frames would ever reach the application layer—they’d all be dropped early. But in practice we’re still seeing bad frames, which tells me the current logic isn’t filtering out all error cases correctly!

In the ../include/camrtc-capture.h

/** @{ */
1066  	/** Channel encountered unrecoverable error and must be reset */
1067  #define CAPTURE_STATUS_FLAG_CHANNEL_IN_ERROR			MK_BIT32(1)
1068  /** @} */

To be more clear, there are two status regs

status.status

#define CAPTURE_STATUS_UNKNOWN       MK_U32(0)
#define CAPTURE_STATUS_SUCCESS       MK_U32(1)
#define CAPTURE_STATUS_CSIMUX_FRAME  MK_U32(2)   <= cover?
#define CAPTURE_STATUS_CSIMUX_STREAM MK_U32(3)  <= cover?
#define CAPTURE_STATUS_CHANSEL_FAULT MK_U32(4)  <= cover?
/* …etc… */

status.flags

#define CAPTURE_STATUS_FLAG_CHANNEL_IN_ERROR    MK_BIT32(1)
#define CAPTURE_STATUS_FLAG_CSIMUX_STREAM_SPURIOUS  MK_BIT32(2)
#define CAPTURE_STATUS_FLAG_CSIMUX_FRAME_FORCE_FE   MK_BIT32(4)
#define CAPTURE_STATUS_FLAG_CHANSEL_NO_MATCH       MK_BIT32(7)
#define CAPTURE_STATUS_FLAG_ERROR_ATOMP_FRAME_TRUNC MK_BIT32(8)

My question is: does the check **status.status** != CAPTURE_STATUS_SUCCESS truly cover all other codes (2, 3, 4, etc.)? If the answer is yes, then in theory none of those frames should reach userspace—but in practice we’re still seeing frames with status codes 2, 3, and 4 making it through, so the logic doesn’t appear to be discarding them.
So I just confirm whether code should be like this?
status.status != CAPTURE_STATUS_SUCCESS | CAPTURE_STATUS_CSIMUX_FRAME | CAPTURE_STATUS_CSIMUX_STREAM | CAPTURE_STATUS_CHANSEL_FAULT

What do you mean?

In red circle not just one condition bit 1. should also include bit 2,4,7,8.
/** Capture status success */
define CAPTURE_STATUS_SUCCESS MK_U32(1)

/** CSI‐MUX frame error (maps to VI CSIMUX_FRAME event) */
define CAPTURE_STATUS_CSIMUX_FRAME MK_U32(2)

/** Data‐specific fault in a channel (maps to VI CHANSEL_FAULT FE event) */
define CAPTURE_STATUS_CHANSEL_FAULT MK_U32(4)

/** Forced frame‐end (cut‐off) in the CHANSEL engine */
define CAPTURE_STATUS_CHANSEL_SHORT_FRAME MK_U32(7)

/** ATOMP packer overflow (frame overflow/dropped) */
define CAPTURE_STATUS_ATOMP_PACKER_OVERFLOW MK_U32(8)

So have you print out the status.status to confirm it?