Xavier Kernel Panic on Jetpack 5.1.1

I applied the patch mentioned in this forum post, but I still encounter this issue occasionally. The kernel buffer logs are shown below.

[   79.414360] tegra-gpcdma 2600000.dma: DMA pause timed out
[   79.414539] tegra-gpcdma 2600000.dma: slave id already in use
[   79.414674] serial-tegra 3100000.serial: Not able to get desc for Tx
[   79.414844] tegra-gpcdma 2600000.dma: slave id already in use
[   79.414970] serial-tegra 3100000.serial: Not able to get desc for Tx
[   79.422428] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
[   79.422690] Mem abort info:
[   79.422771]   ESR = 0x96000004
[   79.422851]   EC = 0x25: DABT (current EL), IL = 32 bits
[   79.423049]   SET = 0, FnV = 0
[   79.423154]   EA = 0, S1PTW = 0
[   79.423281] Data abort info:
[   79.423365]   ISV = 0, ISS = 0x00000004
[   79.423477]   CM = 0, WnR = 0
[   79.423597] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100398000
[   79.423773] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000
[   79.423994] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   79.424145] Modules linked in: overlay lzo_rle lzo_compress zram mttcan ramoops reed_solomon nfnetlink can_dev ip6table_nat can_raw ip6table_filter ip6_tables can iptable_nat nf_nat nf_conntrack micrel nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter binfmt_misc]
[   79.452368] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.104-pmx+ #1
[   79.458744] Hardware name: Unknown Jetson-AGX/Jetson-AGX, BIOS 3.1-32827747 03/19/2023
[   79.466541] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[   79.472585] pc : tegra_uart_tx_dma_complete+0x5c/0xf0
[   79.477670] lr : tegra_uart_tx_dma_complete+0x4c/0xf0
[   79.482900] sp : ffff800010003d50
[   79.486314] x29: ffff800010003d50 x28: ffffd75a8f5a9000 
[   79.492083] x27: 0000000000000007 x26: ffff800010003e08 
[   79.497596] x25: ffffd75a8f8b2740 x24: dead000000000100 
[   79.503109] x23: dead000000000122 x22: ffffd75a8f8b2740 
[   79.508364] x21: ffff76074321f000 x20: 0000000000000080 
[   79.513875] x19: ffff760745e07880 x18: 0000000000000000 
[   79.519213] x17: 0000000000000000 x16: 0000000000000000 
[   79.524983] x15: 0000000000000000 x14: 0000000000000077 
[   79.530495] x13: 0000000000000105 x12: 0000000000000024 
[   79.535576] x11: 0000000000000040 x10: ffffd75a8f937b40 
[   79.541089] x9 : ffffd75a8f937b38 x8 : ffff760740526b80 
[   79.546424] x7 : 0000000000000000 x6 : 00000000205f2d68 
[   79.552020] x5 : 00ffffffffffffff x4 : 0000000000000015 
[   79.557186] x3 : 0000000000000000 x2 : 0000000000000000 
[   79.562782] x1 : 0000000000000005 x0 : ffff760745e07880 
[   79.568120] Call trace:
[   79.570319]  tegra_uart_tx_dma_complete+0x5c/0xf0
[   79.574867]  vchan_complete+0x1fc/0x230
[   79.579061]  tasklet_action_common.isra.0+0x138/0x180
[   79.583890]  tasklet_action+0x30/0x40
[   79.587378]  __do_softirq+0x140/0x3e8
[   79.591135]  irq_exit+0xc0/0xe0
[   79.594028]  __handle_domain_irq+0x74/0xd0
[   79.598052]  efi_header_end+0xb0/0xf0
[   79.601552]  el1_irq+0xd0/0x180
[   79.604962]  cpuidle_enter_state+0xb8/0x410
[   79.608985]  cpuidle_enter+0x40/0x60
[   79.612228]  call_cpuidle+0x44/0x80
[   79.615901]  do_idle+0x208/0x270
[   79.619398]  cpu_startup_entry+0x2c/0x70
[   79.623161]  rest_init+0xdc/0xe8
[   79.626573]  arch_call_rest_init+0x18/0x20
[   79.630342]  start_kernel+0x514/0x54c
[   79.633845] Code: b94043e3 f9413662 aa1303e0 b9428274 (b9400441) 
[   79.640231] ---[ end trace a5edba4cf95c949d ]---
[   79.652877] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[   79.653073] SMP: stopping secondary CPUs
[   79.655726] Kernel Offset: 0x575a7db40000 from 0xffff800010000000
[   79.661750] PHYS_OFFSET: 0xffff89f9c0000000
[   79.666122] CPU features: 0x8240002,03802a30
[   79.670324] Memory Limit: none
[   79.679951] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Hi SanjayD,

Are you using the devkit or custom board for AGX Xavier?
Could you help to verify with the latest JP5.1.2(R35.4.1) to check if the issue still exists?

Hi @KevinFFF ,

I’m using a Connect Tech Rogue Carrier Board. I cannot test with JP5.1.2 yet because the latest BSP from Connect Tech is based off JP 5.1.1.

-Sanjay

Do you apply that patch and compile the kernel image for the custom carrier board?
Since you are using the custom carrier board, how do you get their source and apply the patch?

Do you have the devkit could reproduce the same issue?

@KevinFFF Yes, I apply the patch on Connect Tech’s BSP sources, and compile the kernel image.

I haven’t tried this out on a devkit yet.

Please help to check if this issue could be reproduced on the devkit.

@KevinFFF
The Issue was not reproducible on the devkit. However, I do not have any devices connected to the devkit e.g I have a device connected over UART to /dev/ttyTHS0 through the custom carrier.

After disconnecting the UART device I was not able to reproduce the issue on the Rogue Carrier Board either.

I then re-connected the device, and disabled the nvgetty systemd service since it opens a connection on /dev/ttyTHS0, and the kernel crash usually occurred around the time when the nvgetty service was starting up. Not seeing the issue anymore. Will update the thread if I come across a related kernel panic in the next few days.

Do you mean the issue coming from nvgetty service?

Yes, if there’s a device connected to UART1 (/dev/ttyTHSO).

Chances of reproducing the issue are higher when the Xavier is in power mode 7, and when the Xavier is booting up.

What do you mean about “power mode 7”?

If the issue is coming from nvgetty service, you could just simply disable it.

Could you share the full dmesg for further check?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.