Hi,
I’m working on camera driver development on JetPack-4.5.1 and I have detected a memory access issue when removing and reloading a camera driver module which registers 2 video devices. I haven’t seen it for only one video device.
I have observed the issue with a robust camera driver which creates up to 12 video devices. However, I was able to reproduce it by implementing a simple driver (V4L2 Kernel Driver Version 2.0) which only creates the video device and loading 2 instances of this driver.
The issue doesn’t appear all the time. I have identified the problem by using a script to remove and load the module in a loop until the issue appears. I can see it after a some of minutes, but sometimes it could take like 30 minutes or more to reproduce it.
I have noticed the issue appears faster when stressing the CPU (using stress-ng --cpu 12 --atomic 12
).
This is the the Kernel error:
[ 1651.357405] tegra-vi4 15700000.vi: subdev camdummy 1-0052 unbind
[ 1651.365466] tegra-vi4 15700000.vi: subdev camdummy 1-0050 unbind
[ 1653.520046] camdummy 1-0050: probing v4l2 sensor
[ 1653.520141] camdummy 1-0050: tegracam sensor driver:camdummy_v2.0.6
[ 1653.520177] tegra-vi4 15700000.vi: subdev camdummy 1-0050 bound
[ 1653.541496] camdummy 1-0050: Detected CAMDUMMY sensor
[ 1653.541562] camdummy 1-0052: probing v4l2 sensor
[ 1653.541654] camdummy 1-0052: tegracam sensor driver:camdummy_v2.0.6
[ 1653.541684] tegra-vi4 15700000.vi: subdev camdummy 1-0052 bound
[ 1653.549194] Unable to handle kernel paging request at virtual address 30303735312d95
[ 1653.549598] camdummy 1-0052: Detected CAMDUMMY sensor
[ 1653.570865] Mem abort info:
[ 1653.573899] ESR = 0x96000004
[ 1653.576971] Exception class = DABT (current EL), IL = 32 bits
[ 1653.609166] SET = 0, FnV = 0
[ 1653.612217] EA = 0, S1PTW = 0
[ 1653.625221] Data abort info:
[ 1653.628099] ISV = 0, ISS = 0x00000004
[ 1653.637195] CM = 0, WnR = 0
[ 1653.640157] [0030303735312d95] address between user and kernel address ranges
[ 1653.653197] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1653.658760] Modules linked in: cam_dummy bnep fuse zram overlay bcmdhd cfg80211 userspace_alert nvgpu bluedroid_pm ip_tables x_tables [last unloaded: cam_dummy]
[ 1653.673240] CPU: 0 PID: 24933 Comm: v4l_id Not tainted 4.9.201 #1
[ 1653.679319] Hardware name: quill (DT)
[ 1653.682973] task: ffffffc1e3d1e200 task.stack: ffffffc16b05c000
[ 1653.688885] PC is at read_phy_mode_from_dt+0x4c/0xb8
[ 1653.693839] LR is at csi4_mipi_cal+0x34/0x230
[ 1653.698187] pc : [<ffffff8008b4aa74>] lr : [<ffffff8008b4baac>] pstate: 20400045
[ 1653.705566] sp : ffffffc16b05f690
[ 1653.708871] x29: ffffffc16b05f690 x28: ffffffc1e3d1e200
[ 1653.714194] x27: 0000000000000000 x26: 0000000000000000
[ 1653.719517] x25: 0000000000000080 x24: ffffffc1e7552418
[ 1653.724840] x23: ffffffc1eb650410 x22: ffffffc1e91b1028
[ 1653.730160] x21: ffffffc1e91b1e58 x20: ffffffc1e91b1028
[ 1653.735480] x19: 3030303735312d6d x18: 00000000000012be
[ 1653.740800] x17: 0000000000000001 x16: 0000000000000000
[ 1653.746122] x15: 00000000000002de x14: 0000000000004057
[ 1653.751443] x13: 00000000014e8de5 x12: 00000000011eeadb
[ 1653.756764] x11: 0000000000000000 x10: 0000000000000a10
[ 1653.762084] x9 : ffffffc16b05f430 x8 : ffffffc1e3d1ec70
[ 1653.767402] x7 : 00000000afb50401 x6 : 00000000000000bd
[ 1653.772723] x5 : 0000000000000000 x4 : 000000000001459a
[ 1653.778043] x3 : 00000000762e3030 x2 : 0000000000000000
[ 1653.783364] x1 : ffffffc1eb650410 x0 : 0000000000000158
[ 1653.790169] Process v4l_id (pid: 24933, stack limit = 0xffffffc16b05c000)
[ 1653.796943] Call trace:
[ 1653.799386] [<ffffff8008b4aa74>] read_phy_mode_from_dt+0x4c/0xb8
[ 1653.805380] [<ffffff8008b4baac>] csi4_mipi_cal+0x34/0x230
[ 1653.810767] [<ffffff8008b4ac20>] tegra_csi_mipi_calibrate+0x80/0xd0
[ 1653.817023] [<ffffff8008558594>] nvcsi_finalize_poweron+0x4c/0x98
[ 1653.823105] [<ffffff800852bef4>] nvhost_module_runtime_resume+0xbc/0x280
[ 1653.829794] [<ffffff800878becc>] pm_generic_runtime_resume+0x3c/0x58
[ 1653.836138] [<ffffff8008799d30>] __genpd_runtime_resume+0x38/0xa0
[ 1653.842220] [<ffffff800879c4a4>] genpd_runtime_resume+0xa4/0x210
[ 1653.848214] [<ffffff800878e214>] __rpm_callback+0x74/0xa0
[ 1653.853599] [<ffffff800878e274>] rpm_callback+0x34/0x98
[ 1653.858811] [<ffffff800878f710>] rpm_resume+0x470/0x710
[ 1653.864024] [<ffffff800878f9fc>] __pm_runtime_resume+0x4c/0x70
[ 1653.869843] [<ffffff800852ae2c>] nvhost_module_busy+0x5c/0x168
[ 1653.875663] [<ffffff8008b4c0c8>] csi4_power_on+0x20/0x58
[ 1653.880965] [<ffffff8008b491d8>] tegra_csi_power+0x38/0x158
[ 1653.886525] [<ffffff8008b49324>] tegra_csi_s_power+0x2c/0x38
[ 1653.892173] [<ffffff8008b3e104>] tegra_channel_set_power+0x84/0x198
[ 1653.898428] [<ffffff8008b44e58>] vi4_power_on+0x80/0xa0
[ 1653.903642] [<ffffff8008b3c118>] tegra_channel_open+0x80/0x180
[ 1653.909464] [<ffffff8008b0f9a8>] v4l2_open+0x80/0x118
[ 1653.914506] [<ffffff8008261f6c>] chrdev_open+0x94/0x198
[ 1653.919721] [<ffffff8008258918>] do_dentry_open+0x1d8/0x340
[ 1653.925282] [<ffffff8008259ed0>] vfs_open+0x58/0x88
[ 1653.930149] [<ffffff800826d3b0>] do_last+0x530/0xfd0
[ 1653.935102] [<ffffff800826dee0>] path_openat+0x90/0x378
[ 1653.940316] [<ffffff800826f450>] do_filp_open+0x70/0xe8
[ 1653.945531] [<ffffff800825a394>] do_sys_open+0x174/0x258
[ 1653.950831] [<ffffff800825a4fc>] SyS_openat+0x3c/0x50
[ 1653.955875] [<ffffff800808395c>] __sys_trace_return+0x0/0x4
[ 1653.961436] ---[ end trace 94da04ded20bebe6 ]---
Have you seen this problem?
Do you know if there’s a fix for this bug?
Thanks,
-Enrique