Orin vi5_channel_stop kernel panic on jetpack5.1.1

hi, nvidia team.

I have encountered a vi5 caused system crash

My video routing is ar0231->max96705->max96712->orin 32G
The way I reproduce this problem is to use the v4l2 command to check the frame rate for all four cameras

V4l2 ctl -- set fmt video=width=1920, height=1080-- stream mmap - d/dev/video0

Then I executed the i2c tool to reconfigure the max96712 and max96705 . At this point, the four v4l2 cmd
will no longer refresh. Because I configured the camera, the video stream was interrupted.

After completing the configuration of my i2c tool, the four v4l2 cmd began to refresh the frame rate again.
In some cases, after the reconfiguration is completed, the v4l2 cmd may not continue to refresh the frame rate.
If I Ctrl+C to turn off this v4l2 command, the system will freeze. The stuck log is the same as the topic below

We are using Jetpack 5.0.2. According to the above instructions, we performed the same operation on the orin64g version(jetpack 5.1.1). Discovering that the system will still freeze
The corresponding log is as follows.
Please help to see how to solve this problem
dmesg.log (155.6 KB)
kern.log (213.4 KB)

1 Like

hello mmcly,

as you can see, there’re timeout failures when you’ve interrupt the video stream.

Jul  6 15:22:23 orin-master kernel: [249373.354268] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
Jul  6 15:22:23 orin-master kernel: [249373.354270] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
...

may I know the failure rate?
there’s internally 2500ms timeout, and it may take couple of seconds to restore capture engine.
may I also confirm how long had you waited to confirm v4l cmd cannot continue to refresh the capture?

besides, could you please also test again by adding bypass_mode setting to your v4l pipeline,
for example, $ v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap

1 Like

I didn’t do detailed statistics, but I think the frequency is quite high. Actually, I didn’t mean to reconfigure the camera registers when reading frame rates. Our orin has a very low probability of crashing and restarting, but I just found that this operation can cause orin to crash more frequently.

From a system perspective, even if the orin cannot read image, it should not crash.

Usually, 96712 is connected to a 4-way camera. After I reconfigure the registers, several of them can continue to refresh the fps normally, but one of them cannot refresh, I will wait for a few seconds and use Ctrl+C

use bypass_mode the system still crash when I use ctrl+c on a no reflashed v4l2 cmd

1 Like

is it always the same node which not re-sending frames?

1 Like

No, this is random

the The abnormal process displayed by top cmd is as follows。 by the way, I think this is a system issue, not just related to video. right ?

hello mmcly,

that’s right, it looks system level issue, which occupied 100% CPU resources.
could you force-stop the service by… $ sudo kill <pid>?

it didn’t have any effect

@JerryChang

any update about this issue?

BTW, maybe you can refer to my method to reproduce this problem?

here shows test steps in brief. we cannot reproduce this locally.
step1) launch gst pipeline to enable camera preview
$ gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),framerate=30/1,format=NV12' ! nvvidconv ! xvimagesink
step2) sending commands on the terminal to shutdown the stream,
# cd /sys/kernel/debug/camera-video0
# echo 0 > streaming

hi,@JerryChang
I don’t know if there are any similarities or differences between our two different operations on the system low level driver . I couldn’t find the streaming port in my debug port. So I can’t operate your way.

But from my understanding, this error should be a low-level system error. Can you analyze anything in the log I provided. And what else can I do to help troubleshoot this issue

hello mmcly,

please download the attached camera firmware, Jul13_camera-rtcpu-t234-rce.img (525.3 KB)
assume you already have Jetpack environment setup, you should update rce-fw binary file, $OUT/Linux_for_Tegra/bootloader/camera-rtcpu-t234-rce.img
after that, you may perform partition flash to update camera firmware for your AGX Orin,
i.e. $ sudo ./flash.sh -r -k A_rce-fw jetson-agx-orin-devkit mmcblk0p1

please reproduce the issue again, and also gather the complete kernel logs for reference,
thanks

hi @JerryChang
We have tested the new firmware you sent. But unfortunately, the problem still exists. The following is the corresponding log. Please help to investigate

Jul13_camera-rtcpu-t234-rce-debug.log (135.4 KB)
Jul13_camera-rtcpu-t234-rce-dmesg.log (129.6 KB)

may I know how you integrate to JP-5.1.1?
it looks you’re still using JP-5.0.2 according to the logs.

[ 30.377691] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for aarch64 35.1.0 Release Build (buildbrain@mobile-u64-5273-d7000) Wed Aug 10 20:32:41 PDT 2022
...
[ 310.033064] channel context at 0 is busy
[ 310.033099] WARNING: CPU: 0 PID: 4641 at /home/ly/nvidia/nvidia_sdk/JetPack_5.0.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/source/public/kernel_src/kernel/nvidia/drivers/platform/tegra/rtcpu/capture-ivc.c:176

yes, I do the test on jetpack 5.0.2. infact I find the issue show same result on jetpack 5.0.2 and jetpack5.1.1. so I just replace the firmware on jetpack 5.0.2 (Last week, our Jetpack 5.1.1 machine was delivered to the project site.) does the fireware must be test on jetpack 5.1.1?

since there’re changes check-in to address some issues, please use the identical version of firmware binary with Jetpack release.

ok ,we have do the test on jetpack5.1.1. in fact, It doesn’t actually help. here’s the log
Jul13_camera-rtcpu-t234-rce-jp5.1.1-debug.log (132.6 KB)
Jul13_camera-rtcpu-t234-rce-jp5.1.1-dmesg.log (111.3 KB)

hello mmcly,

would you please update the kernel image to add below patch for capture-ivc.c.
for example,

diff --git a/drivers/platform/tegra/rtcpu/capture-ivc-priv.h b/drivers/platform/tegra/rtcpu/capture-ivc-priv.h
index 04c95aef3..f6cb92497 100644
--- a/drivers/platform/tegra/rtcpu/capture-ivc-priv.h
+++ b/drivers/platform/tegra/rtcpu/capture-ivc-priv.h
@@ -37,6 +37,7 @@ struct tegra_capture_ivc_cb_ctx {
        tegra_capture_ivc_cb_func cb_func;
        /** Private context of a VI/ISP capture context */
        const void *priv_context;
+       struct semaphore sem_ch;
 };
 
 /**
diff --git a/drivers/platform/tegra/rtcpu/capture-ivc.c b/drivers/platform/tegra/rtcpu/capture-ivc.c
index c5e1bc519..7aae4920e 100644
--- a/drivers/platform/tegra/rtcpu/capture-ivc.c
+++ b/drivers/platform/tegra/rtcpu/capture-ivc.c
@@ -28,11 +28,15 @@
 #include <linux/tegra-ivc.h>
 #include <linux/tegra-ivc-bus.h>
 #include <linux/nospec.h>
-
+#include <linux/semaphore.h>
 #include <asm/barrier.h>
 
 #include "capture-ivc-priv.h"
 
+
+/* Timeout for acquiring channel-id */
+#define TIMEOUT_ACQUIRE_CHANNEL_ID 120
+
 static int tegra_capture_ivc_tx(struct tegra_capture_ivc *civc,
                                const void *req, size_t len)
 {
@@ -165,6 +169,11 @@ int tegra_capture_ivc_notify_chan_id(uint32_t chan_id, uint32_t trans_id)
 
        civc = __scivc_control;
 
+       if (down_timeout(&civc->cb_ctx[chan_id].sem_ch,
+                       TIMEOUT_ACQUIRE_CHANNEL_ID)) {
+               return -EBUSY;
+       }
+
        mutex_lock(&civc->cb_ctx_lock);
 
        if (WARN(civc->cb_ctx[trans_id].cb_func == NULL,
@@ -269,6 +278,7 @@ int tegra_capture_ivc_unregister_control_cb(uint32_t id)
        civc->cb_ctx[id].priv_context = NULL;
 
        mutex_unlock(&civc->cb_ctx_lock);
+       up(&civc->cb_ctx[id].sem_ch);
 
        /*
         * If it's trans_id, client encountered an error before or during
@@ -415,6 +425,9 @@ static int tegra_capture_ivc_probe(struct tegra_ivc_channel *chan)
        mutex_init(&civc->cb_ctx_lock);
        mutex_init(&civc->ivc_wr_lock);
 
+       for (i = 0; i < TOTAL_CHANNELS; i++)
+               sema_init(&civc->cb_ctx[i].sem_ch, 1);
+
        /* Initialize ivc_work */
        INIT_WORK(&civc->work, tegra_capture_ivc_worker);

It works!!! thanks @JerryChang

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.