V4L2 timeout leads to NULL pointer dereference in kernel in jetpack 4.6

Steps to reproduce:
Xavier NX Devkit
JetPack 4.6
Using Raspi Cam V2.1 (IMX219 driver) on CAM0

For testing, start stream, works fine:

$ v4l2-ctl --stream-mmap
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.09 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.04 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.03 fps

Then, in parallel, cause a timeout by stopping the camera:

$ sudo su
$ cd /sys/kernel/debug/imx219_c
$ echo "0x100 0x00" > d

This leads to the following error in the kernel log:

[  539.228448] tegra194-vi5 15c10000.vi: no reply from camera processor
[  539.228612] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 2500 ms
[  539.228786] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel
[  539.232009] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[  539.232215] Mem abort info:
[  539.232270]   ESR = 0x96000005
[  539.232332]   Exception class = DABT (current EL), IL = 32 bits
[  539.232509]   SET = 0, FnV = 0
[  539.232579]   EA = 0, S1PTW = 0
[  539.232644] Data abort info:
[  539.232723]   ISV = 0, ISS = 0x00000005
[  539.232794]   CM = 0, WnR = 0
[  539.232856] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc15c4ba000
[  539.232966] [0000000000000010] *pgd=0000000000000000, *pud=0000000000000000
[  539.233109] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  539.233210] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay bnep rtk_btusb rtl8822ce btusb btrtl btbcm btintel cfg80211 userspace_alert binfmt_misc nvgpu ip_tables x_tables
[  539.248020] CPU: 2 PID: 9637 Comm: vi-output, imx2 Tainted: G        W       4.9.253-tegra #1
[  539.256573] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[  539.263130] task: ffffffc1f2d50e00 task.stack: ffffffc1f293c000
[  539.269177] PC is at _raw_write_lock+0x30/0x58
[  539.273202] LR is at destroy_buffer_table+0x40/0xd8
[  539.278266] pc : [<ffffff8008f6c6f0>] lr : [<ffffff8008b4d280>] pstate: 20c00045
[  539.285440] sp : ffffffc1f293fc70
[  539.288508] x29: ffffffc1f293fc70 x28: 0000000000000000
[  539.294367] x27: 0000000000000000 x26: 0000000000000000
[  539.299963] x25: 0000000000000010 x24: 0000000000000098
[  539.304525] x23: 0000000000000018 x22: ffffff8009087458
[  539.310039] x21: 0000000000000000 x20: ffffffc1dcbc3800
[  539.315800] x19: 0000000000000010 x18: 0000000000000000
[  539.321240] x17: 0000000000000002 x16: 0000000000000003
[  539.327263] x15: 000000000000002a x14: 00060000000bfeb7
[  539.332950] x13: 0006000000000000 x12: ffffff800c000064
[  539.338639] x11: 0000000000000400 x10: 0000000000000a10
[  539.344328] x9 : ffffffc1f293fa80 x8 : 0000000000000000
[  539.350103] x7 : ffffffc192b7f2c0 x6 : ffffffc163720d41
[  539.355615] x5 : ffffff800852d904 x4 : ffffffbf058dc810
[  539.360702] x3 : 0000000000000000 x2 : ffffffc163720d40
[  539.366292] x1 : 0000000000000000 x0 : 0000000080000000

[  539.372779] Process vi-output, imx2 (pid: 9637, stack limit = 0xffffffc1f293c000)
[  539.380027] Call trace:
[  539.382572] [<ffffff8008f6c6f0>] _raw_write_lock+0x30/0x58
[  539.387646] [<ffffff8008b4d280>] destroy_buffer_table+0x40/0xd8
[  539.392988] [<ffffff8008b3f58c>] vi_capture_shutdown+0xd4/0x130
[  539.398846] [<ffffff8008b3fbe4>] vi_channel_close_ex+0x34/0x88
[  539.404445] [<ffffff8008b410f0>] vi5_channel_error_recover+0x48/0x1c8
[  539.410397] [<ffffff8008b361c0>] tegra_channel_error_recover+0x58/0x90
[  539.416261] [<ffffff8008b419b0>] tegra_channel_kthread_capture_dequeue+0xf8/0x1c0
[  539.423176] [<ffffff80080db09c>] kthread+0xec/0xf0
[  539.427988] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  539.433322] ---[ end trace 3d8f3e796590f570 ]---
[  539.453705] note: vi-output, imx2[9637] exited with preempt_count 1

Cancelling the v4l2-ctl command makes it even worse:

[  641.372865] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  641.373043] Mem abort info:
[  641.373124]   ESR = 0x96000005
[  641.373185]   Exception class = DABT (current EL), IL = 32 bits
[  641.373289]   SET = 0, FnV = 0
[  641.373344]   EA = 0, S1PTW = 0
[  641.373401] Data abort info:
[  641.373452]   ISV = 0, ISS = 0x00000005
[  641.373541]   CM = 0, WnR = 0
[  641.373603] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc1a1e98000
[  641.373713] [0000000000000000] *pgd=0000000000000000, *pud=0000000000000000
[  641.373854] Internal error: Oops: 96000005 [#2] PREEMPT SMP
[  641.373959] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay bnep rtk_btusb rtl8822ce btusb btrtl btbcm btintel cfg80211 userspace_alert binfmt_misc nvgpu ip_tables x_tables
[  641.375726] CPU: 5 PID: 9635 Comm: v4l2-ctl Tainted: G      D W       4.9.253-tegra #1
[  641.378608] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[  641.384911] task: ffffffc1936baa00 task.stack: ffffffc147250000
[  641.391213] PC is at exit_creds+0x2c/0x78
[  641.395148] LR is at __put_task_struct+0x4c/0x140
[  641.399520] pc : [<ffffff80080de15c>] lr : [<ffffff80080aee9c>] pstate: 60400045
[  641.406869] sp : ffffffc147253a00
[  641.410364] x29: ffffffc147253a00 x28: ffffffc1936baa00
[  641.415875] x27: ffffff8008f82000 x26: ffffffc1ba25b180
[  641.421476] x25: ffffffc1c4c3d1e8 x24: ffffffc1e05b1598
[  641.426214] x23: 0000000000000001 x22: ffffffc1e2ac4018
[  641.431800] x21: ffffffc1f2d50e30 x20: 000000000000000b
[  641.437311] x19: ffffffc1f2d50e00 x18: 0000007f7cb36a70
[  641.442912] x17: 0000007f7caa9d40 x16: ffffff8008274b68
[  641.448436] x15: 0000000000000000 x14: 000000000003be1d
[  641.454373] x13: 0000000000000000 x12: 0000000000000000
[  641.459638] x11: ffffff8008f899d0 x10: 0000000000000a10
[  641.465664] x9 : ffffffc147253850 x8 : ffffffc1936bb470
[  641.471436] x7 : 0000000000000400 x6 : 0000000000000000
[  641.476702] x5 : 0000000000000800 x4 : 0000000000000000
[  641.482288] x3 : 00000000000000bf x2 : 0000000000000000
[  641.487375] x1 : 0000000000000000 x0 : 00000000ffffffff

[  641.494120] Process v4l2-ctl (pid: 9635, stack limit = 0xffffffc147250000)
[  641.501015] Call trace:
[  641.503125] [<ffffff80080de15c>] exit_creds+0x2c/0x78
[  641.508193] [<ffffff80080aee9c>] __put_task_struct+0x4c/0x140
[  641.513531] [<ffffff80080dbbfc>] kthread_stop+0x1e4/0x1e8
[  641.518610] [<ffffff8008b41308>] vi5_channel_stop_kthreads+0x40/0x58
[  641.524475] [<ffffff8008b413ec>] vi5_channel_stop_streaming+0xcc/0xd0
[  641.530603] [<ffffff8008b3305c>] tegra_channel_stop_streaming+0x34/0x48
[  641.537069] [<ffffff8008b2b374>] __vb2_queue_cancel+0x34/0x188
[  641.542409] [<ffffff8008b2c89c>] vb2_core_queue_release+0x2c/0x58
[  641.548531] [<ffffff8008b2ef1c>] _vb2_fop_release+0x84/0xa0
[  641.553613] [<ffffff8008b349f4>] tegra_channel_close+0x64/0x140
[  641.559299] [<ffffff8008b082e8>] v4l2_release+0x48/0xa0
[  641.564637] [<ffffff800825dcb8>] __fput+0x90/0x1d0
[  641.569442] [<ffffff800825de70>] ____fput+0x20/0x30
[  641.574000] [<ffffff80080d8dac>] task_work_run+0xbc/0xd8
[  641.579770] [<ffffff80080b8408>] do_exit+0x2e0/0xa88
[  641.584668] [<ffffff80080b8c40>] do_group_exit+0x40/0xa8
[  641.590007] [<ffffff80080c64ec>] get_signal+0xbc/0x750
[  641.595171] [<ffffff800808add8>] do_signal+0x130/0x500
[  641.600160] [<ffffff800808b320>] do_notify_resume+0x90/0xb0
[  641.605762] [<ffffff800808379c>] work_pending+0x8/0x10
[  641.610841] ---[ end trace 3d8f3e796590f571 ]---
[  641.630038] Fixing recursive fault but reboot is needed!

This problem was not present with JetPack 4.5.1 and prior.
It does not occur when streaming via nvarguscamerasrc.

Even though this might look contrived; having timeout not crash is essential to have a camera running in trigger mode.

hello tim.cassens ,

could you please share more details for why to program register to stop the stream?
if you’re going to have simulation method to stop the stream, please use below commands to force-stop the sensor stream,
for example,
echo 0 > /sys/kernel/debug/camera-video0/streaming

Thanks for the hint for the shortcut, but that doesn’t exist in the IMX219 driver.

Regarding the reasoning, I wrote at the very bottom. It simulates behavior that cameras show that are not necessarily available for you guys to test with:

This disables an important feature required for industrial camera applications. Also, (very) long exposure times are made impossible.

Additionally, there might be other situations where the camera unexpectedly does not deliver images (e.g. during driver development), and having the kernel crash immediately in those cases is not exactly helpful.

I guess the whole channel_error_recover-thing in the source code is there for exactly that reason… it just should not crash.

hello tim.cassens ,

you should test with application with error handling implemented.
for example, Argus/samples/userAutoExposure
it’s the EVENT_TYPE_ERROR flag for application side to shutdown the camera app gracefully.

            } else if (iEvent->getEventType() == EVENT_TYPE_ERROR) {
                const IEventError* iEventError =
                    interface_cast<const IEventError>(event);
                EXIT_IF_NOT_OK(iEventError->getStatus(), "ERROR event");
            }

Hello Jerry,

quoting myself again:

So I very much think there is no problem when using anything Argus.

It only happens when capturing images using the v4l2 interface.

hello tim.cassens,

I would like to emphasize it’s not an usual use-case to force-stop the stream.
however,
when you force-stop the streaming. please dig into kernel sources to check if below function has being called,
i.e. tegra_channel_error_recover(),
thanks

As you can see in my original post:

Yes, it is indeed calling tegra_channel_error_recover.

A look at the source suggests that deep inside that call, destroy_buffer_table is called with tab = NULL (since then &tab->hlock yields 16, and the access violation happens at address 16).

This implies that in vi_capture_shutdown, capture->buf_ctx is NULL. I don’t understand enough of how that code is supposed to work to understand why; but I think that is where whoever wrote it should have a look.

hello tim.cassens,

we’re able to reproduce the same failure as yours.
it’s now having internal discussion, will update this thread once we have conclusions.
thanks

hello tim.cassens,

please update kernel sources as below to fix NULL pointer dereference,
for example,

diff --git a/drivers/media/platform/tegra/camera/vi/capture.c b/drivers/media/platform/tegra/camera/vi/capture.c

@@ -205,7 +205,9 @@ void vi_capture_shutdown(struct tegra_vi_channel *chan)
                }

                capture_common_unpin_memory(&capture->requests);
-               destroy_buffer_table(capture->buf_ctx);
+               if (capture->buf_ctx != NULL)
+                       destroy_buffer_table(capture->buf_ctx);

Hello Jerry,

thank you.

Will this change get into the official kernel binaries at some point in the future?

hello tim.cassens,

we’re having code-review process, it’ll merge into rel-32 code-line once complete.

Hi Jerry,
I got the same log as below,

user@localhost:~$ [ 473.931631] tegra194-vi5 15c10000.vi: no reply from camera processor
[ 473.931790] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 2500 ms
[ 473.931971] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel
[ 473.935223] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[ 473.935391] Mem abort info:
[ 473.935478] ESR = 0x96000005
[ 473.935539] Exception class = DABT (current EL), IL = 32 bits
[ 473.935735] SET = 0, FnV = 0
[ 473.935791] EA = 0, S1PTW = 0
[ 473.935849] Data abort info:
[ 473.935901] ISV = 0, ISS = 0x00000005
[ 473.936001] CM = 0, WnR = 0
[ 473.936076] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc1b9cd5000
[ 473.936190] [0000000000000010] *pgd=0000000000000000, *pud=0000000000000000
[ 473.936338] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 473.936436] Modules linked in: xt_multiport veth ip6t_MASQUERADE nf_nat_masquerade_ipv6 xt_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_mark xt_comment ip6table_filter ip6table_mangle iptable_mangle ip6s
[ 473.976906] CPU: 1 PID: 29944 Comm: vi-output, lt69 Tainted: G O 4.9.253-tegra #1
[ 473.984958] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[ 473.991083] task: ffffffc1f4d94600 task.stack: ffffffc1bb678000
[ 473.996775] PC is at _raw_write_lock+0x30/0x58
[ 474.000536] LR is at destroy_buffer_table+0x40/0xd8
[ 474.005607] pc : [] lr : [] pstate: 20c00045
[ 474.012518] sp : ffffffc1bb67bc70
[ 474.015851] x29: ffffffc1bb67bc70 x28: 0000000000000000
[ 474.021702] x27: 0000000000000000 x26: 0000000000000000
[ 474.026948] x25: 0000000000000010 x24: 0000000000000098
[ 474.032110] x23: 0000000000000018 x22: ffffff8009085898
[ 474.037201] x21: 0000000000000000 x20: ffffffc140498380
[ 474.042962] x19: 0000000000000010 x18: 00000000000062bc
[ 474.048737] x17: 0000000000000002 x16: 0000000000000003
[ 474.054512] x15: 0000000000000376 x14: 0000000000000322
[ 474.060197] x13: 0000000000000538 x12: 0000000000000400
[ 474.065641] x11: 0000000000000400 x10: 0000000000000000
[ 474.071327] x9 : ffffffc1bb67ba80 x8 : 0000000000000000
[ 474.077348] x7 : ffffffc1a639a9c0 x6 : ffffffc1968ca141
[ 474.082613] x5 : ffffff800852c254 x4 : ffffffbf065a3290
[ 474.087948] x3 : 0000000000000000 x2 : ffffffc1968ca140
[ 474.093537] x1 : 0000000000000000 x0 : 0000000080000000
[ 474.098623]
[ 474.100280] Process vi-output, lt69 (pid: 29944, stack limit = 0xffffffc1bb678000)
[ 474.107278] Call trace:
[ 474.109825] [] _raw_write_lock+0x30/0x58
[ 474.114899] [] destroy_buffer_table+0x40/0xd8
[ 474.120236] [] vi_capture_shutdown+0xd4/0x130
[ 474.126093] [] vi_channel_close_ex+0x34/0x88
[ 474.131432] [] vi5_channel_error_recover+0x48/0x1c8
[ 474.137647] [] tegra_channel_error_recover+0x58/0x90
[ 474.143772] [] tegra_channel_kthread_capture_dequeue+0xf8/0x1c0
[ 474.150424] [] kthread+0xec/0xf0
[ 474.155319] [] ret_from_fork+0x10/0x30
[ 474.160579] —[ end trace a125e88de5168183 ]—
[ 474.179126] Kernel panic - not syncing: Fatal exception
[ 474.179253] SMP: stopping secondary CPUs
[ 474.179343] Kernel Offset: disabled
[ 474.179420] Memory Limit: none
[ 474.180774] trusty-log panic notifier - trusty version Built: 12:20:34 Jul 26 2021 [ 474.200425] Rebooting in 10 seconds…
����Shutdown state requested 1
Rebooting system …

This only happens on no camera connected and then system auto reboot.
Once the camera connected and detected, it works well.
What should I do?

Thanks!

hello YHuang0915,

did you apply the changes in comment #12 to reproduce this?
may I know what’s the actual use-case to enable the stream without camera sensor,
could you please share the pipeline for reference?
thanks

Hi jerry

Thanks for your reply. After picking below changes, problem is fixed :)

1 Like