Kernel crash when resetting capture channel

Hi,
we sometimes observe Kernel crashes in the camera subsystem when receiving bad frames.
We are using two cameras, each with three virtual channels.
We are running Kernel 6.6 on R36.4.0/JP 6.1 on a Jetson Orin NX 8 GB.

The crashes mostly follow this rough pattern:

Pattern of kernel crash

ERROR tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms

WARNING tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel

WARNING channel context at 2 is busy WARNING WARNING: CPU: 1 PID: 1385 at /usr/src/debug/nvidia-kernel-oot/36.4.0/nvidia-oot/drivers/platform/tegra/rtcpu/capture-ivc.c:195 tegra_capture_ivc_notify_chan_id+0x188/0x1b0 [capture_ivc]

ERROR tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
ERROR tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed

WARNING refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 1040 at /lib/refcount.c:25 refcount_warn_saturate+0x120/0x148

FATAL Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000

The kernel should not crash just because some frames failed to be captured.
Can you tell us how to solve this issue?

It seems like this or similar issues were already reported on this forum over the last years.
I also found this pinned post:

Is it addressing the issue we observe here?

Here are some logs of the issue:
crash-on-corrupted-frames_0.txt (12.0 KB)
crash-on-corrupted-frames_1.txt (26.2 KB)

Thanks for any help

Please apply 0001-capture-ivc-fix-multi-cam-race-condition.patch only to verify.

Thanks

So you think it is probably the same issue?
I’ll try the patch. Though as this issue only occurs quite sporadically on our devices it might take some time until I can find a device with the issue, so that I can verify it fixes the issue.

Thanks

Is this still an issue to support? Any result can be shared?

Hi, as we cannot reliable reproduce this issue, we can’t say anything yet about the effectiveness of this patch.
I will report here again in case we observe a case where the bug occurs with the patch applied.