We’re running into a kernel panic that matches the same crash pattern reported by zhang.pei.xing in Thread #343830:
Our crash signature:
WARNING: CPU: 4 PID: 4389 at kernel/kthread.c:83 kthread_stop+0x58/0x290
Unable to handle kernel paging request at virtual address 0000ffffbab130e0
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Call trace:
kthread_stop+0xa4/0x290
vi5_channel_stop_kthreads+0x44/0x60
vi5_channel_stop_streaming+0x128/0x140
Kernel panic - not syncing: Oops: Fatal exception
Same function, same offset, same error code. In post #19, zhang.pei.xing confirmed that the root cause was identified and a fix was already available internally at NVIDIA. Could we get access to that patch?
The kernel panic we’re seeing is caused by a race condition in the RTCPU firmware. When multiple cameras enter error recovery concurrently, the RTCPU returns the wrong channel_id in channel_setup_resp, causing recovery to fail and ultimately crashing the kernel. This is the same root cause identified in NVIDIA Forum by Zhang.pei.xing.
We’re on Xavier NX, JP513, kernel 5.10.192-tegra #279, running 8 cameras via FPD-Link III.
Any help would be greatly appreciated. Thank you!
===================================================
======Kernel log=====================================
Message from syslogd@vision at Feb 23 21:06:32 …
kernel:[16076.942816] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[16076.939741] snd_soc_tegra210_adsp(E) snd_soc_tegra210_ahub(E) snd_soc_tegra_utils(E) snd_soc_simple_card_utils(E) nvadsp(E) tegra210_adma(E) snd_hda_codec(E) snd_hda_core(E) spi_tegra114(E) loop(E) binfmt_misc(E) ina3221(E) pwm_fan(E) nvgpu(E) nvmap(E) ip_tables(E) x_tables(E) [last unloaded: mtd]
[16076.939924] CPU: 2 PID: 10266 Comm: symbot_server-1 Tainted: G W E 5.10.192-tegra #279
[16076.939940] Hardware name: Symbot (DT)
[16076.939953] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=–)
[16076.939970] pc : kthread_stop+0x58/0x290
[16076.939979] lr : kthread_stop+0x4c/0x290
[16076.939985] sp : ffff80001cfdba40
[16076.939991] x29: ffff80001cfdba40 x28: ffff243a75273600
[16076.940033] x27: 0000000040045613 x26: 0000000000000000
[16076.940048] x25: ffff80001cfdbd08 x24: ffff2439c31cc1f8
[16076.940063] x23: ffff243a41540000 x22: ffff80001cfdbd08
[16076.940078] x21: ffff243a8a01d8b0 x20: ffff2439c31cc550
[16076.940108] x19: ffff243a8a01d880 x18: 0000000000000010
[16076.940122] x17: 0000000000000000 x16: ffffb8bcef0e50f0
[16076.940137] x15: ffff243a41540570 x14: ffffffffffffffff
[16076.940151] x13: ffff80009cfdb667 x12: ffff80001cfdb66f
[16076.940173] x11: 0000000000000001 x10: 0000000000000ab0
[16076.940194] x9 : ffff80001cfdba30 x8 : 612d657375203b30
[16076.940209] x7 : 206e6f206e6f6974 x6 : c0000000ffffefff
[16076.940224] x5 : ffff243b3fde1978 x4 : ffffb8bcf0eb7ba8
[16076.940238] x3 : 0000000000000001 x2 : ffff243b3fde1980
[16076.940252] x1 : 0000000000000000 x0 : 0000000000408004
[16076.940267] Call trace:
[16076.940276] kthread_stop+0x58/0x290
[16076.940288] vi5_channel_stop_kthreads+0x44/0x60
[16076.940302] vi5_channel_stop_streaming+0x128/0x140
[16076.940318] tegra_channel_stop_streaming+0x3c/0x70
[16076.940328] __vb2_queue_cancel+0x40/0x220
[16076.940337] vb2_core_streamoff+0x34/0xd0
[16076.940346] vb2_streamoff+0x34/0x80
[16076.940354] vb2_ioctl_streamoff+0x58/0x70
[16076.940364] v4l_streamoff+0x40/0x50
[16076.940372] __video_do_ioctl+0x188/0x400
[16076.940380] video_usercopy+0x280/0x7e0
[16076.940388] video_ioctl2+0x40/0x100
[16076.940395] v4l2_ioctl+0x68/0x90
[16076.940406] __arm64_sys_ioctl+0xac/0xf0
[16076.940417] el0_svc_common.constprop.0+0x80/0x1d0
[16076.940426] do_el0_svc+0x38/0xc0
[16076.940439] el0_svc+0x1c/0x30
[16076.940453] el0_sync_handler+0xa8/0xb0
[16076.940472] el0_sync+0x16c/0x180
[16076.940481] —[ end trace 8cf42039d1202bf8 ]—
[16076.940680] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[16076.940956] Mem abort info:
[16076.941065] ESR = 0x96000004
[16076.941209] EC = 0x25: DABT (current EL), IL = 32 bits
[16076.941674] SET = 0, FnV = 0
[16076.941830] EA = 0, S1PTW = 0
[16076.941951] Data abort info:
[16076.942033] ISV = 0, ISS = 0x00000004
[16076.942157] CM = 0, WnR = 0
[16076.942269] user pgtable: 4k pages, 48-bit VAs, pgdp=000000017fa23000
[16076.942478] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[16076.942816] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[16076.943027] Modules linked in: veth(E) xt_nat(E) xt_tcpudp(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) br_netfilter(E) lzo_rle(E) lzo_compress(E) zram(E) overlay(E) snd_soc_tegra186_asrc(E) snd_soc_tegra210_ope(E) snd_soc_tegra186_arad(E) snd_soc_tegra186_dspk(E) snd_soc_tegra210_iqc(E) snd_soc_tegra210_afc(E) snd_soc_tegra210_mvc(E) snd_soc_tegra210_dmic(E) snd_soc_tegra210_amx(E) snd_soc_tegra210_adx(E) snd_soc_tegra210_admaif(E) snd_soc_tegra210_i2s(E) snd_soc_tegra_pcm(E) snd_soc_tegra210_mixer(E) snd_soc_tegra210_sfc(E) aes_ce_blk(E) crypto_simd(E) cryptd(E) aes_ce_cipher(E) ghash_ce(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) snd_soc_spdif_tx(E) snd_soc_tegra_machine_driver(E) leds_gpio(E) max77620_thermal(E) ramoops(E) reed_solomon(E) realtek(E) snd_hda_codec_hdmi(E) tegra_bpmp_thermal(E) userspace_alert(E) snd_hda_tegra(E)
