Kernel Crash during Camera Module Reloading for more than 1 video device

enrique.ramirez · September 2, 2021, 9:42pm

Hi,

I’m working on camera driver development on JetPack-4.5.1 and I have detected a memory access issue when removing and reloading a camera driver module which registers 2 video devices. I haven’t seen it for only one video device.

I have observed the issue with a robust camera driver which creates up to 12 video devices. However, I was able to reproduce it by implementing a simple driver (V4L2 Kernel Driver Version 2.0) which only creates the video device and loading 2 instances of this driver.

The issue doesn’t appear all the time. I have identified the problem by using a script to remove and load the module in a loop until the issue appears. I can see it after a some of minutes, but sometimes it could take like 30 minutes or more to reproduce it.
I have noticed the issue appears faster when stressing the CPU (using stress-ng --cpu 12 --atomic 12).

This is the the Kernel error:

[ 1651.357405] tegra-vi4 15700000.vi: subdev camdummy 1-0052 unbind
[ 1651.365466] tegra-vi4 15700000.vi: subdev camdummy 1-0050 unbind
[ 1653.520046] camdummy 1-0050: probing v4l2 sensor
[ 1653.520141] camdummy 1-0050: tegracam sensor driver:camdummy_v2.0.6
[ 1653.520177] tegra-vi4 15700000.vi: subdev camdummy 1-0050 bound
[ 1653.541496] camdummy 1-0050: Detected CAMDUMMY sensor
[ 1653.541562] camdummy 1-0052: probing v4l2 sensor
[ 1653.541654] camdummy 1-0052: tegracam sensor driver:camdummy_v2.0.6
[ 1653.541684] tegra-vi4 15700000.vi: subdev camdummy 1-0052 bound
[ 1653.549194] Unable to handle kernel paging request at virtual address 30303735312d95
[ 1653.549598] camdummy 1-0052: Detected CAMDUMMY sensor
[ 1653.570865] Mem abort info:
[ 1653.573899]   ESR = 0x96000004
[ 1653.576971]   Exception class = DABT (current EL), IL = 32 bits
[ 1653.609166]   SET = 0, FnV = 0
[ 1653.612217]   EA = 0, S1PTW = 0
[ 1653.625221] Data abort info:
[ 1653.628099]   ISV = 0, ISS = 0x00000004
[ 1653.637195]   CM = 0, WnR = 0
[ 1653.640157] [0030303735312d95] address between user and kernel address ranges
[ 1653.653197] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1653.658760] Modules linked in: cam_dummy bnep fuse zram overlay bcmdhd cfg80211 userspace_alert nvgpu bluedroid_pm ip_tables x_tables [last unloaded: cam_dummy]
[ 1653.673240] CPU: 0 PID: 24933 Comm: v4l_id Not tainted 4.9.201 #1
[ 1653.679319] Hardware name: quill (DT)
[ 1653.682973] task: ffffffc1e3d1e200 task.stack: ffffffc16b05c000
[ 1653.688885] PC is at read_phy_mode_from_dt+0x4c/0xb8
[ 1653.693839] LR is at csi4_mipi_cal+0x34/0x230
[ 1653.698187] pc : [<ffffff8008b4aa74>] lr : [<ffffff8008b4baac>] pstate: 20400045
[ 1653.705566] sp : ffffffc16b05f690
[ 1653.708871] x29: ffffffc16b05f690 x28: ffffffc1e3d1e200 
[ 1653.714194] x27: 0000000000000000 x26: 0000000000000000 
[ 1653.719517] x25: 0000000000000080 x24: ffffffc1e7552418 
[ 1653.724840] x23: ffffffc1eb650410 x22: ffffffc1e91b1028 
[ 1653.730160] x21: ffffffc1e91b1e58 x20: ffffffc1e91b1028 
[ 1653.735480] x19: 3030303735312d6d x18: 00000000000012be 
[ 1653.740800] x17: 0000000000000001 x16: 0000000000000000 
[ 1653.746122] x15: 00000000000002de x14: 0000000000004057 
[ 1653.751443] x13: 00000000014e8de5 x12: 00000000011eeadb 
[ 1653.756764] x11: 0000000000000000 x10: 0000000000000a10 
[ 1653.762084] x9 : ffffffc16b05f430 x8 : ffffffc1e3d1ec70 
[ 1653.767402] x7 : 00000000afb50401 x6 : 00000000000000bd 
[ 1653.772723] x5 : 0000000000000000 x4 : 000000000001459a 
[ 1653.778043] x3 : 00000000762e3030 x2 : 0000000000000000 
[ 1653.783364] x1 : ffffffc1eb650410 x0 : 0000000000000158 

[ 1653.790169] Process v4l_id (pid: 24933, stack limit = 0xffffffc16b05c000)
[ 1653.796943] Call trace:
[ 1653.799386] [<ffffff8008b4aa74>] read_phy_mode_from_dt+0x4c/0xb8
[ 1653.805380] [<ffffff8008b4baac>] csi4_mipi_cal+0x34/0x230
[ 1653.810767] [<ffffff8008b4ac20>] tegra_csi_mipi_calibrate+0x80/0xd0
[ 1653.817023] [<ffffff8008558594>] nvcsi_finalize_poweron+0x4c/0x98
[ 1653.823105] [<ffffff800852bef4>] nvhost_module_runtime_resume+0xbc/0x280
[ 1653.829794] [<ffffff800878becc>] pm_generic_runtime_resume+0x3c/0x58
[ 1653.836138] [<ffffff8008799d30>] __genpd_runtime_resume+0x38/0xa0
[ 1653.842220] [<ffffff800879c4a4>] genpd_runtime_resume+0xa4/0x210
[ 1653.848214] [<ffffff800878e214>] __rpm_callback+0x74/0xa0
[ 1653.853599] [<ffffff800878e274>] rpm_callback+0x34/0x98
[ 1653.858811] [<ffffff800878f710>] rpm_resume+0x470/0x710
[ 1653.864024] [<ffffff800878f9fc>] __pm_runtime_resume+0x4c/0x70
[ 1653.869843] [<ffffff800852ae2c>] nvhost_module_busy+0x5c/0x168
[ 1653.875663] [<ffffff8008b4c0c8>] csi4_power_on+0x20/0x58
[ 1653.880965] [<ffffff8008b491d8>] tegra_csi_power+0x38/0x158
[ 1653.886525] [<ffffff8008b49324>] tegra_csi_s_power+0x2c/0x38
[ 1653.892173] [<ffffff8008b3e104>] tegra_channel_set_power+0x84/0x198
[ 1653.898428] [<ffffff8008b44e58>] vi4_power_on+0x80/0xa0
[ 1653.903642] [<ffffff8008b3c118>] tegra_channel_open+0x80/0x180
[ 1653.909464] [<ffffff8008b0f9a8>] v4l2_open+0x80/0x118
[ 1653.914506] [<ffffff8008261f6c>] chrdev_open+0x94/0x198
[ 1653.919721] [<ffffff8008258918>] do_dentry_open+0x1d8/0x340
[ 1653.925282] [<ffffff8008259ed0>] vfs_open+0x58/0x88
[ 1653.930149] [<ffffff800826d3b0>] do_last+0x530/0xfd0
[ 1653.935102] [<ffffff800826dee0>] path_openat+0x90/0x378
[ 1653.940316] [<ffffff800826f450>] do_filp_open+0x70/0xe8
[ 1653.945531] [<ffffff800825a394>] do_sys_open+0x174/0x258
[ 1653.950831] [<ffffff800825a4fc>] SyS_openat+0x3c/0x50
[ 1653.955875] [<ffffff800808395c>] __sys_trace_return+0x0/0x4
[ 1653.961436] ---[ end trace 94da04ded20bebe6 ]---

Have you seen this problem?
Do you know if there’s a fix for this bug?

Thanks,
-Enrique

ShaneCCC · September 3, 2021, 3:09am

Did you verify on reference sensor ov5693?

enrique.ramirez · September 14, 2021, 4:06pm

Hi @ShaneCCC

The Jetson TX2 devkit supports a single ov5693 camera. As this problem appears only when registering 2 or more video devices, it’s not possible to reproduce the issue with this sensor.

Do you have any idea about what could be the problem?

-Enrique

ShaneCCC · September 14, 2021, 5:40pm

Please provide the detail reproduce step I can check with multiple sensors board.

enrique.ramirez · September 14, 2021, 6:04pm

@ShaneCCC

You just need to remove and re-load the module several times.

I did a script to perform the modules reload loop, you just need to change the module name:
modules_reload_loop.sh (225 Bytes)

I also needed to stress (at least for the dummy driver setup) the system while with this command while reloading the modules:
stress-ng --cpu 12 --atomic 12

During the modules reload, I monitor the kernel messages until I get the issue:
dmesg -w

The issue could take some time to be observed.

Let me know if you need more information.

Thanks

ShaneCCC · September 15, 2021, 4:16am

Any reason for the module reloading continuously?

enrique.ramirez · September 15, 2021, 3:03pm

That’s an endurance test. If this is a race condition it could appear anytime, so we need to make sure that this problem will not occur.

enrique.ramirez · September 23, 2021, 1:45pm

Any news on this topic? Have you tried to reproduce it?

With the simple driver (dummy driver) I don’t see it very frequently, that’s why I’m stressing the system. However, for the robust driver that we are developing I see the issue very often. Sometimes it appears on the first reload (without running the stress-ng command).

ShaneCCC · September 29, 2021, 3:30am

We are able saw the issue and going to figure out the solution.

enrique.ramirez · September 29, 2021, 4:44pm

Great! Please keep me updated on any finding.

Thanks!

ShaneCCC · September 29, 2021, 5:15pm

Current developer resource is tight may have slow progress.
Do you see the issue often during boot instead of stress?

enrique.ramirez · September 30, 2021, 2:28pm

I have never seen the issue after boot, it only appears after removing and re-loading the sensor module.

On our robust driver, the issue appears without using the stress command, but in that case the we have real cameras, 12 streams an higher power consumption.
Using the dummy driver I have only reproduce it the issue when using the stress command.

ShaneCCC · October 5, 2021, 8:01am

Could you have long run reboot test to confirm it. If the reboot without problem I think we can low priority for this issue.

enrique.ramirez · October 26, 2021, 2:18pm

@ShaneCCC

I can confirm that this issue doesn’t appear at boot time. However, I disagree with you about saying that this is a low priority issue since Loadable Kernel Module (LKM) is a feature that you are supporting, but it’s broken because re-loading could fail anytime.

-Enrique

ShaneCCC · October 26, 2021, 2:55pm

Already reported to developer. But current can’t get resource to investigate it to figure the root cause.
Also if normal boot without problem should be low risk issue I think.

ShaneCCC · March 24, 2022, 9:44am

Looks like this issue fixed by new release J4.6.1(r32.7.1)

system · April 13, 2022, 7:12am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kernel error when loading and unloading camera kernel driver module several times Jetson Xavier NX camera	2	446	May 18, 2023
sensor driver crash in vi when run as a module Jetson TX1	2	605	July 18, 2017
Loadable Sensor Driver Module Jetson TX1	4	840	October 18, 2021
Segmentation fault when loading a driver as module Jetson TX2	6	1243	October 18, 2021
Kernel panic has occurred after process start Jetson TX2 kernel , nvbugs	15	1475	October 18, 2021
Kernel Crash in tegra camera driver Jetson Xavier NX camera , kernel	2	500	July 7, 2023
Integration of new camera sensor - No cameras available issue Jetson AGX Xavier camera	31	1882	October 18, 2021
Vi5 error： Unable to handle kernel NULL pointer dereference at virtual address 00000000 Jetson AGX Xavier camera , gstreamer	26	1538	March 3, 2024
Porting TC358840 to Jetson TX2i Jetpack4.2 Jetson TX2	8	1255	October 18, 2021
ov5693 module JP 4.2.2 Jetson AGX Xavier	4	793	October 18, 2021

Kernel Crash during Camera Module Reloading for more than 1 video device

Related topics