How to solve kernel crash for VI driver in Jetson Xavier?

Hi,
I am using UYVY camera sensor and able to stream perfectly in Xavier but some time anything wrong happens in my sensor side then I am not able to stop application and kernel crashes. The only way to recover again by the reboot.

[ 1284.506478] ------------[ cut here ]------------
[ 1284.510977] WARNING: CPU: 2 PID: 8668 at /home/ritesh/PROJECT/kernel/kernel-4.9/drivers/media/v4l2-core/videobuf2-core.c:899 vb2_buffer_done+0x1f0/0x230
[ 1284.527847] Modules linked in: bnep fuse nvs_bmi160 nvs ar1335(O) pwm_pca9685 bluedroid_pm ip_tables x_tables

[ 1284.527891] CPU: 2 PID: 8668 Comm: vi-output, ar13 Tainted: G W O 4.9.108-g696b0c1 #61
[ 1284.527894] Hardware name: jetson-xavier (DT)
[ 1284.527898] task: ffffffc3e9234600 task.stack: ffffffc349400000
[ 1284.527911] PC is at vb2_buffer_done+0x1f0/0x230
[ 1284.527916] LR is at free_ring_buffers+0x6c/0x138
[ 1284.527919] pc : [] lr : [] pstate: 20c00045
[ 1284.527920] sp : ffffffc349403d00
[ 1284.527922] x29: ffffffc349403d00 x28: 0000000000000002
[ 1284.527925] x27: 0000000000000000 x26: ffffffc3d681beb8
[ 1284.527928] x25: ffffff8009e87000 x24: 431bde82d7b634db
[ 1284.527930] x23: 0000000000000007 x22: ffffffc3d681bb48
[ 1284.527933] x21: ffffffc3b123b400 x20: ffffffc3b123b400
[ 1284.527935] x19: ffffffc3d681b018 x18: 0000000000000010
[ 1284.527937] x17: 0000007fab76ef60 x16: 0000000000000001
[ 1284.527940] x15: 0000000000002f06 x14: 0000000000000000
[ 1284.527942] x13: 0000000000000000 x12: 0000000000002552
[ 1284.527944] x11: 0000000000000400 x10: 000000000004fda8
[ 1284.527946] x9 : 0000000000000313 x8 : 000000000000021a
[ 1284.527949] x7 : 0000000000000000 x6 : 0000000000000000
[ 1284.527951] x5 : fffffffffffffff0 x4 : ffffffc3d681bd00
[ 1284.527954] x3 : 0000000000000003 x2 : ffffffc39fd76818
[ 1284.527956] x1 : 0000000000000007 x0 : 0000000000000007

[ 1284.527960] —[ end trace 1dcac0298b5ebd3f ]—
[ 1284.532306] Call trace:
[ 1284.532316] [] vb2_buffer_done+0x1f0/0x230
[ 1284.532320] [] free_ring_buffers+0x6c/0x138
[ 1284.532326] [] tegra_channel_kthread_capture_dequeue+0x184/0x390
[ 1284.532334] [] kthread+0xe8/0x100
[ 1284.532342] [] ret_from_fork+0x10/0x50

NOTE:- Above crash continuously coming and if I leave Xavier board after the crash then complete Xavier memory full.
I checked above crash is because of free_ring_buffers called in case of the negative value of chan->num_buffers then I tried to fix this with some change in tegra_channel_capture_dequeue function.

if (chan->num_buffers >= (chan->capture_queue_depth - 1)) {
chan->buffer_state[chan->free_index] = buf->vb2_state;
free_ring_buffers(chan, 1);
}

After above line added in tegra_channel_capture_dequeue then above kernel crash not happening and able to stop the application.
But now I am getting the new kernel crash.
[ 1902.374303] WARNING: CPU: 2 PID: 8632 at /home/ritesh/PROJECT/kernel/kernel-4.9/drivers/media/v4l2-core/videobuf2-core.c:1659 __vb2_queue_cancel+0x114/0x170
[ 1902.374589] Modules linked in: bnep fuse nvs_bmi160 nvs ar1335(O) pwm_pca9685 bluedroid_pm ip_tables x_tables

[ 1902.374644] CPU: 2 PID: 8632 Comm: multicam.elf Tainted: G W O 4.9.108-g696b0c1 #88
[ 1902.374653] Hardware name: jetson-xavier (DT)
[ 1902.374658] task: ffffffc397b18000 task.stack: ffffffc3693d8000
[ 1902.374663] PC is at __vb2_queue_cancel+0x114/0x170
[ 1902.374667] LR is at __vb2_queue_cancel+0x3c/0x170
[ 1902.374670] pc : [] lr : [] pstate: 80400045
[ 1902.374672] sp : ffffffc3693dbae0
[ 1902.374675] x29: ffffffc3693dbae0 x28: 0000000000000009
[ 1902.374681] x27: 0000007f6d0871f0 x26: ffffffc3af52bb80
[ 1902.374687] x25: ffffffc369211710 x24: 0000000000000001
[ 1902.374696] x23: ffffffc3b5c30048 x22: ffffffc3d7d7cb48
[ 1902.374702] x21: ffffffc3d7d7c030 x20: ffffffc3d7d7cba8
[ 1902.374707] x19: ffffffc3d7d7cb48 x18: 0000000000000000
[ 1902.374712] x17: 0000007f8fdf2a78 x16: 000000000000d45e
[ 1902.374717] x15: ffffff800a16e2c0 x14: 2065736e6f707365
[ 1902.374722] x13: 0000000000000000 x12: 071c71c71c71c71c
[ 1902.374727] x11: 000000000000000b x10: 00000000000009d0
[ 1902.374732] x9 : ffffffc3693d8000 x8 : ffffffc397b18a30
[ 1902.374741] x7 : fefefeff646c606d x6 : 0000000000000000
[ 1902.374746] x5 : fffffffffffffffc x4 : 0000000000000040
[ 1902.374751] x3 : 0000000000000001 x2 : 0000000000000000
[ 1902.374756] x1 : 0000000000000001 x0 : 0000000000000004

[ 1902.374763] —[ end trace 9c12d2f948c860f4 ]—
[ 1902.374852] Call trace:
[ 1902.374860] [] __vb2_queue_cancel+0x114/0x170
[ 1902.374866] [] vb2_core_queue_release+0x2c/0x58
[ 1902.374875] [] _vb2_fop_release+0x88/0xa8
[ 1902.374881] [] tegra_channel_close+0x58/0x120
[ 1902.374886] [] v4l2_release+0x48/0xa0
[ 1902.374893] [] __fput+0x94/0x1d0
[ 1902.374897] [] ____fput+0x20/0x30
[ 1902.374905] [] task_work_run+0xc0/0xe0
[ 1902.374912] [] do_exit+0x2d4/0xa18
[ 1902.374917] [] do_group_exit+0x40/0xa8
[ 1902.374924] [] get_signal+0x3d8/0x5c8
[ 1902.374931] [] do_signal+0x17c/0x4f8
[ 1902.374940] [] do_notify_resume+0x98/0xb8
[ 1902.374945] [] work_pending+0x8/0x10

So, Can anyone suggest me solution for the 2nd crash if first applied fix is fine?

Thanks

What’s your command to repo this issue?

Hi Shane,

I have same problem on Jetpack 4.2.1 (kernel version 32.2).
If camera side had something wrong, kernel sometime would be crashed and could not recover or stop the application except rebooting.
Any help would be much appreciated.

Thanks and Best Regards,
Vu Nguyen

@forever3000
Probably your case is different, it could be the sensor signal cause the capture error and timeout.
You may need to make sure the output MIPI signal is as MIPI spec.

Thank for your reply.
I know that the problem come from sensor signal. But it have just happened several times then I want to ask is there any way to prevent kernel crashing in kernel code.

Thanks and Best Regards,

Hi ShaneCCC,

More information about my current issue.

Thanks and Best Regards,

@forever3000
Please have a try below patch if help.

https://devtalk.nvidia.com/default/topic/1068235/jetson-tx2/easy-way-to-get-submit-patches-to-linux-4-9-or-linux-nvidia/post/5410976/#5410976

Hi ShaneCCC,

It looks helpful.

Thanks and Best Regards,