Enclosed is the kernel log and the Kconfig options used to build the L4T kernel. The kernel was built with the RT patchset applied and some debug features enabled to validate the stability of the kernel. The kernel was running on an Orin AGX dev kit.
Note, CONFIG_LOCALVERSION is not set but the kernel and modules (including the overlay drivers) were built and installed as a different set of binaries from the default L4T kernel binaries.
Building the kernel with CONFIG_DEBUG_SPINLOCK=y results in the following inconsistent kernel splat related to the rtl8822ce IRQ handler:
[ 553.147483] BUG: spinlock wrong owner on CPU#11, wpa_supplicant/749
[ 553.147535] lock: 0xffff4b56cd55ba18, .magic: dead4ead, .owner: RTW_CMD_THREAD/1036, .owner_cpu: -1
[ 553.147601] CPU: 11 PID: 749 Comm: wpa_supplicant Not tainted 5.10.104-RedHawk-8.4.4-debug #1
[ 553.147627] Hardware name: /, BIOS 1.0-d7fb19b 08/10/2022
[ 553.147637] Call trace:
[ 553.147643] dump_backtrace+0x0/0x1d0
[ 553.147683] show_stack+0x20/0x30
[ 553.147699] dump_stack+0xf8/0x168
[ 553.147736] spin_dump+0x9c/0xb0
[ 553.147777] do_raw_spin_unlock+0xd8/0x100
[ 553.147824] _raw_spin_unlock_irqrestore+0x48/0xb0
[ 553.147865] rtl8822ce_tx_poll_handler+0x8c/0xa0 [rtl8822ce]
[ 553.148503] rtw_tx_poll_timeout_handler+0x34/0x70 [rtl8822ce]
[ 553.149062] timer_hdl+0x18/0x24 [rtl8822ce]
[ 553.149582] call_timer_fn+0xd8/0x420
[ 553.149615] __run_timers.part.0+0x294/0x3f0
[ 553.149632] run_timer_softirq+0x50/0x90
[ 553.149650] __do_softirq+0x17c/0x6d0
[ 553.149669] irq_exit+0x1b0/0x1d0
[ 553.149705] __handle_domain_irq+0xa0/0x110
[ 553.149734] gic_handle_irq+0x60/0x130
[ 553.149747] el0_irq_naked+0x50/0x58
I have not found a reliable way to reproduce the issue. I have run various CPU and networking stress loads and the issue may appear after a few minutes or 24 hours.
Other times the splat appears with the following:
1 [68656.040290] BUG: spinlock wrong owner on CPU#11, wpa_supplicant/659
2 [68656.040341] lock: 0xffff5eec48913a18, .magic: dead4ead, .owner: /-1, .owner_cpu: -1
3 [68656.040406] CPU: 11 PID: 659 Comm: wpa_supplicant Not tainted 5.10.104-RedHawk-8.4.4-debug #1
The primary concern is the “bad” values within the spinlock’s struct. The spinlock’s data is being modified while the lock is being held and before it is released. This would indicate that one of the functions within the rtl8822ce IRQ handler “rtl8822ce_tx_poll_handler()” is unintentionally corrupting memory owned by the spinlock.
Please note that networking still works after the splat and the system appears to be usable.
Whatever is happening to corrupt the spinlock’s data may impact performance. I am concerned about the stability of this driver.
l4t-rtl8822ce-spinlock-error-report-dmesg.txt (84.4 KB)
debug (218.8 KB)