[80864.598452] BUG: scheduling while atomic: perception/9656/0x00000302

Hello,

I am encountering a kernel panic causing my system to reset. This is a critical issue for me since it is bringing down our systems in the field.

Here is the call trace:

[80864.598452] BUG: scheduling while atomic: perception/9656/0x00000302
[80864.599169] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ieee80211_crypto_ccmp_decrypt+0x354/0x360 [mac80211]
[80864.599495] CPU: 0 PID: 9656 Comm: perception Tainted: G        W  OE     5.10.104-tegra #2
[80864.599712] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS 2.1-32413640 01/24/2023
[80864.599935] Call trace:
[80864.600005]  dump_backtrace+0x0/0x1d0
[80864.600105]  show_stack+0x30/0x40
[80864.600199]  dump_stack+0xd8/0x138
[80864.600295]  panic+0x17c/0x384
[80864.600381]  __stack_chk_fail+0x30/0x40
[80864.600496]  ieee80211_crypto_ccmp_decrypt+0x354/0x360 [mac80211]
[80864.600666]  ieee80211_rx_handlers+0xdac/0x2240 [mac80211]
[80864.600809]  sugov_deferred_update+0x28/0xa0
[80864.600929]  _raw_spin_unlock+0x1c/0x60
[80864.601086]  sugov_update_shared+0x1d4/0x210
[80864.601741]  enqueue_top_rt_rq+0x88/0x160
[80864.602358]  resched_curr+0x1c/0xa0
[80864.602894]  check_preempt_curr+0x64/0xb0
[80864.603511]  task_woken_rt+0x20/0x90
[80864.604049]  ttwu_do_wakeup+0x68/0x1b0
[80864.607054]  ttwu_do_activate+0xa0/0x150
[80864.610904]  try_to_wake_up+0x244/0x720
[80864.614666]  wake_up_process+0x2c/0x40
[80864.618605]  __irq_wake_thread+0x84/0xb0
[80864.622456]  __handle_irq_event_percpu+0xa0/0x2a0
[80864.627091]  handle_irq_event_percpu+0x94/0xa0
[80864.631467]  sched_clock_cpu+0x10/0x20
[80864.635229]  irqtime_account_irq+0x5c/0x160
[80864.639430]  irq_exit+0x24/0xe0
[80864.642579]  __handle_domain_irq+0x74/0xd0
[80864.646605]  gic_handle_irq+0x68/0x134
[80864.650454]  el1_irq+0xd0/0x180
[80864.653432] SMP: stopping secondary CPUs
[80864.657203] Kernel Offset: 0x4f10305d0000 from 0xffff800010000000
[80864.663316] PHYS_OFFSET: 0xffffa81e00000000
[80864.667344] CPU features: 0x0040006,4a80aa38
[80864.671717] Memory Limit: none
[80864.679433] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ieee80211_crypto_ccmp_decrypt+0x354/0x360 [mac80211] ]---

This as some similarities to the other kernel bug I reported along with a proposed patch here: Debugging BCCPLEXWDT reset source - #9 by bsirang1

In both cases the call stack goes throughieee80211_crypto_ccmp_decrypt. Despite the similarities, it seems different enough to warrant another forum thread.

@DaneLLL & @DaveYYY , can you look into this?

Thanks.

Hi,

is there a specific use case in which this issue will be triggered?
Looks like a stack overflow happened in ieee80211_crypto_ccmp_decrypt,
but as this function is from the upstream Linux kernel, maybe you should also report it to Linux kernel team/Ubuntu:
https://help.ubuntu.com/stable/ubuntu-help/report-ubuntu-bug.html.en

Similar to the other kernel bug, I do not have a reproduction case. It seems to happen “at random”. This is on a system that does go in and out of Wi-Fi range, so it may be related to weak signal and/or reassociation with the AP.

Hi,

did the situation happen after you applied your own patch?
I will share our patch that is merged into our code base, but has yet to be released publicly, in your previous post. Not sure if the situation will be the same, but you may give it a try.

This was with our patch yes. We have since applied your patch. It doesn’t seem related to this crash though.

Hi,

sorry for the late reply.
Are you using the WiFi card shipped with the AGX Orin DevKit (Intel 8265NGW)?
If that’s the case, we may have our developers look into it.
Otherwise, please contact the vendor of the WiFi card.

Yes we are using that card

0007:01:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
	Subsystem: Intel Corporation Dual Band Wireless-AC 8265
	Flags: bus master, fast devsel, latency 0, IRQ 304
	Memory at 3228000000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi

Hi,

we’ve been discussing internally with our developers, but still, it’s hard to debug if we cannot re-produce it.
Have you encountered this issue again recently, and do you feel is there any usecase with which this kernel panic will more likely to be triggered?

Hi,

can you enable CONFIG_DEBUG_STACKOVERFLOW in the kernel config for more debug info?
Also try KASAN (Kernel Address Sanitizer):

CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y or CONFIG_KASAN_OUTLINE=y  (Former inserts KASan instrumentation code inline making the kernel size bulkier but it is 1.1x-2.0x faster than KASAN_OUTLINE. Later is slower but has negligible impact on kernel size)
CONFIG_FRAME_WARN=4096

I believe I’ve narrowed this down to the usage of the bgscan parameter in wpa_supplicant.conf. I do not have anything definitive, I just haven’t seen a mac80211 related kernel panic in over a week after removing bgscan.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.