Debugging BCCPLEXWDT reset source

It may not worth to check any patch that was from years ago. Orin AGX is already moving to rel-35 while those old posts are still rel-32 which is kernel4.9.

Please take it as new issue and share us the steps to reproduce.

We are using the following Intel card as a WiFi client via wpa_supplicant

0007:01:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)

We are running wpa_supplicant on our wlan interface

/sbin/wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant.conf -Dnl80211 -iwlan0

Our wpa_supplicant.conf just has a couple network blocks that look like this

network={
ssid="<omitted>"
priority=1
freq_list=2412 2417 2422 2427 2432 2437 2442 2447 2452 2457 2462
psk=<omitted>
}

We are not doing anything special to reproduce this issue. We’ve managed to reproduce it after leaving a system idle for 12-24 hours. Sometimes it happens after the system was been running for less than an hour. We don’t have any clear way to reproduce it.

Thanks for the reply.
We’ll setup devices and see if we can catch the same error.

We believe there’s some dependence on the access point configuration to trigger the buggy codepath. Specifically CCMP encryption type for key negotiation.

wpa_supplicant[509]: wlan0: WPA: Key negotiation completed with xx:xx:xx:xx:xx:xx [PTK=CCMP GTK=CCMP]

A week ago when I reported this issue, we have patched the kernel, replacing GFP_KERNEL with in_interrupt() ? GFP_ATOMIC : GFP_KERNEL in the various dma function calls inside tregra-se-nvhost.c.

We have not had a crash since, so it may have resolved the issue. We can submit a kernel patch if Nvidia can review it and confirm it is an appropriate fix. I’d like some confirmation on this.

Hi,

we’ve run the system for more than 60 hours, and we do not find any kernel panic.
It would be great if you can share the patch with us, and we’ll have our developer review it.
Thanks.

Attached is the patch that we applied. Note that we found every instance where GFP_KERNEL was used as an argument to dma_xxx APIs. Likely that was overkill, but we are not sure which codepaths are possible from interrupt contexts and which are not.

kernel-tegra-se.patch (16.8 KB)

1 Like

Hi @DaveYYY , Do you have an update on this?

Hi @bsirang1,

our developers are evaluating the patch, and we’ll update with you given any progress.

1 Like

Any update on a fix for this? I am hitting this same issue where if I have a USB WIFI device attached to either the Orin Dev board or my custom board, same kernel panic. If I boot with it, it never fully boots. If I plug it in after I boot, it immediately kernel panics.

I have not tried the patch above yet.

Yes, I believe it’s already fixed in our internal code base, and might be able to catch up with the next L4T release.

Is there anything we can download to test ahead of time?

Thanks!
Rick

Hi,

35.4.1 will be released soon, and please wait for the public release.
(Again, I’m just saying It might be able to catch up with the next L4T release.)
We usually don’t release patches on the forum.

Okay. Thanks for the heads up! Should I ask back here when it drops to see if it included those updates?

Feel free to do so.

I just reported another kernel bug that may be related: [80864.598452] BUG: scheduling while atomic: perception/9656/0x00000302

Hi @bsirang1 @enc0der,

please try this patch.

diff --git a/drivers/crypto/tegra-se-nvhost.c b/drivers/crypto/tegra-se-nvhost.c
index 62e005d..c077db7 100644
--- a/drivers/crypto/tegra-se-nvhost.c
+++ b/drivers/crypto/tegra-se-nvhost.c
@@ -4,7 +4,7 @@
  *
  * Support for Tegra Security Engine hardware crypto algorithms.
  *
- * Copyright (c) 2015-2022, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2015-2023, NVIDIA CORPORATION.  All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -5219,8 +5219,9 @@
 		total += ilen;
 
 		/* 2.2 - Copy adata and map it */
-		adata = dma_alloc_coherent(se_dev->dev, assoclen,
-						&adata_addr, GFP_KERNEL);
+		adata = dma_alloc_coherent(se_dev->dev, assoclen, &adata_addr,
+					(req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
+					GFP_KERNEL : GFP_ATOMIC);
 		num_sgs = tegra_se_count_sgs(sg, assoclen);
 		sg_copy_to_buffer(sg, num_sgs, adata, assoclen);
 
@@ -5415,7 +5416,7 @@
 	 * cryptlen case.
 	 */
 	dst_buf = dma_alloc_coherent(se_dev->dev, cryptlen+1, &dst_buf_dma_addr,
-		GFP_KERNEL);
+		(req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? GFP_KERNEL : GFP_ATOMIC);
 	if (!dst_buf)
 		return -ENOMEM;
 
@@ -5955,7 +5956,8 @@
 	 * cryptlen case.
 	 */
 	dst_buf = dma_alloc_coherent(se_dev->dev, cryptlen+1, &dst_buf_dma_addr,
-					GFP_KERNEL);
+				    (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
+				    GFP_KERNEL : GFP_ATOMIC);
 	if (!dst_buf)
 		return -ENOMEM;

After applying the patch, we did encounter another kernel panic. Not sure if it’s a coincidence or related.

[ 4636.915356] BUG: scheduling while atomic: foxglove/69724/0x00000302
[ 4636.916228] Unable to handle kernel execute from non-executable memory at virtual address ffff800010003920
[ 4636.916516] Mem abort info:
[ 4636.916584]   ESR = 0x8600000f
[ 4636.916674]   EC = 0x21: IABT (current EL), IL = 32 bits
[ 4636.916814]   SET = 0, FnV = 0
[ 4636.916897]   EA = 0, S1PTW = 0
[ 4636.916978] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000032f3c1000
[ 4636.917153] [ffff800010003920] pgd=00000001000b3003, p4d=00000001000b3003, pud=00000001000b4003, pmd=00000001000b5003, pte=00680001000b2f03
[ 4636.917477] Internal error: Oops: 8600000f [#1] PREEMPT SMP
[ 4636.917630] Modules linked in: nvidia_modeset(OE) xt_mark(E) xt_tcpudp(E) veth(E) nf_conntrack_netlink(E) nfnetlink(E) br_netfilter(E) binfmt_misc(E) ip6table_nat(E) overlay(E) micrel(E) lzo_rle(E) lzo_compress(E) zram(E) ip6table_filter(E) ip6_tables(E) xt_state(E) xt_conntrack(E) iptable_filter(E) xt_MASQUERADE(E) xt_nat(E) xt_multiport(E) xt_addrtype(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) ramoops(E) reed_solomon(E) loop(E) nf_defrag_ipv4(E) libcrc32c(E) nvgpu(E) iwlmvm(E) mac80211(E) iwlwifi(E) aes_ce_blk(E) crypto_simd(E) cryptd(E) aes_ce_cipher(E) ghash_ce(E) sha2_ce(E) cfg80211(E) sha256_arm64(E) sha1_ce(E) pwm_fan(E) btusb(E) ftdi_sio(E) btrtl(E) usbserial(E) btbcm(E) tegra_bpmp_thermal(E) userspace_alert(E) nct1008(E) btintel(E) ina3221(E) spi_tegra114(E) nvmap(E) nvidia(OE) gs_usb(E) mttcan(E) can_dev(E) can_raw(E) can(E) ip_tables(E) x_tables(E) [last unloaded: mtd]
[ 4636.961770] CPU: 0 PID: 145 Comm: irq/150-host_sy Tainted: G        W  OE     5.10.104-tegra #1
[ 4636.970693] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS 2.1-32413640 01/24/2023
[ 4636.979618] pstate: 80c00089 (Nzcv daIf +PAN +UAO -TCO BTYPE=--)
[ 4636.985659] pc : 0xffff800010003920
[ 4636.989335] lr : __wake_up_common+0x90/0x150
[ 4636.993791] sp : ffff8000121fba20
[ 4636.997204] x29: ffff8000121fba20 x28: ffffc24172f75370 
[ 4637.002720] x27: 0000000000000000 x26: 0000000000000000 
[ 4637.008229] x25: 0000000000000003 x24: 0000000000000000 
[ 4637.013742] x23: 0000000000000001 x22: ffff8000121fbad0 
[ 4637.019254] x21: ffff70fc97b4dbd8 x20: 00000000843a6580 
[ 4637.024766] x19: ffffc2410377e4fc x18: 0000000000000000 
[ 4637.030105] x17: 0000000000018021 x16: 0000000000018020 
[ 4637.035617] x15: 0000000000018001 x14: 0000000000000000 
[ 4637.040953] x13: 0000000000000020 x12: 0101010101010101 
[ 4637.046379] x11: 0000000000000001 x10: 0000000000000004 
[ 4637.051891] x9 : 000000003af18906 x8 : 0000000000000000 
[ 4637.057229] x7 : ffff8000121fbc58 x6 : ffff8000100038a0 
[ 4637.062567] x5 : ffff800010003920 x4 : 0000000000000000 
[ 4637.067991] x3 : 0000000000000000 x2 : 0000000000000000 
[ 4637.073329] x1 : 0000000000000003 x0 : ffff8000100038a0 
[ 4637.078667] Call trace:
[ 4637.081119]  0xffff800010003920
[ 4637.084270]  __wake_up_common_lock+0x88/0xe0
[ 4637.088554]  __wake_up+0x44/0x60
[ 4637.091793]  action_wakeup+0x54/0x80
[ 4637.095293]  run_handlers+0xd8/0x190
[ 4637.098966]  process_wait_list+0x264/0x300
[ 4637.102992]  nvhost_syncpt_thresh_fn+0x54/0x90
[ 4637.107281]  syncpt_thresh_cascade_isr+0x1b0/0x2b0
[ 4637.112093]  irq_thread_fn+0x34/0xa0
[ 4637.115767]  irq_thread+0x158/0x250
[ 4637.119267]  kthread+0x148/0x170
[ 4637.122506]  ret_from_fork+0x10/0x24
[ 4637.126182] Code: 48ee1a00 ffff70fe 64769500 c68d9f13 (100039b0) 
[ 4637.132312] ---[ end trace 3026889686f40ac2 ]---
[ 4637.141597] Kernel panic - not syncing: Oops: Fatal exception
[ 4637.142458] SMP: stopping secondary CPUs
[ 4637.146049] Kernel Offset: 0x424162850000 from 0xffff800010000000
[ 4637.152342] PHYS_OFFSET: 0xffff8f0680000000
[ 4637.156369] CPU features: 0x0040006,4a80aa38
[ 4637.160567] Memory Limit: none
[ 4637.168292] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

Notice nvhost_syncpt_thresh_fn in the call stack.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Can this issue still be reproduced with the JetPack 5.1.2/L4T 35.4.1?