My team have a collection of Nvidia Jetson Orin NX with jetpack 5.1.2. They use an Aetina AIB SN-41 carrier board.
We have recently started observing repeated kernel failures on some of our newer devices when trying to send data over the serial port “/dev/ttyTHS1”. The failures are semi-random in that the exact time is not consistent but the failure always eventually occurs.
The failure can be recreated using the following python script
import serial
import time
ser = serial.Serial("/dev/ttyTHS1", 115200, timeout=1)
ser.flush()
i = 0
while True:
msg= "456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456"
i+=1
if ser.writable():
ser.write(msg.encode('ascii'))
print('writing: ' + str(i))
time.sleep(0.03333)
This script uses the pyserial library which can be installed with pip install pyserial
After about 2000 iterations the syslog should start showing serial failures of the form
[ 1340.194894] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1340.706891] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1341.218873] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1341.730481] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1341.737943] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1341.744127] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1341.751401] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1342.274470] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1342.281351] tegra-gpcdma 2600000.gpcdma: slave id already in use
This eventually results in a OS crash and system reboot.
The crash log from the debug uart port is shown below
nvidia@orinnx32: ~nvidia@orinnx32:~$ [ 1339.656643] serial-tegra 3110000.serid
[ 1340.194894] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1340.706891] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1341.218873] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1341.730481] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1341.737943] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1341.744127] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1341.751401] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1342.274470] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1342.281351] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1342.287548] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1342.294825] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1342.818457] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1342.825343] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1342.831524] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1342.838802] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1343.362819] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1343.874795] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1344.390414] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1344.899117] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1345.410400] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1345.420471] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1345.954382] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1345.973754] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1346.498733] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1347.010726] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1347.522346] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1347.529222] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1347.535410] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1347.542683] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1348.068075] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1348.590573] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1349.124097] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1349.640055] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1350.178269] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1350.207440] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1350.722628] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1351.240907] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1351.778612] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1352.294597] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1352.802589] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1353.314202] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1353.321092] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1353.327278] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1353.334560] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1353.858185] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1353.865607] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1353.874486] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1354.407554] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1354.946168] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1354.957516] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1355.490760] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1356.002137] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1356.009033] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1356.015224] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1356.022518] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1356.546505] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1357.058118] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1357.065942] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1357.072150] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1357.079415] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1357.602103] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1357.608999] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1357.615188] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1357.622472] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1358.146092] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1358.152974] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1358.159173] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1358.166441] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1358.690478] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1359.206580] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1359.714419] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[ 1360.226035] serial-tegra 3110000.serial: RxData PIO to tty layer failed
[ 1360.232934] tegra-gpcdma 2600000.gpcdma: slave id already in use
[ 1360.239142] serial-tegra 3110000.serial: Not able to get desc for Rx
[ 1360.245709] Unable to handle kernel NULL pointer dereference at virtual addr4
[ 1360.245710] Mem abort info:
[ 1360.245712] ESR = 0x96000004
[ 1360.245714] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1360.245715] SET = 0, FnV = 0
[ 1360.245715] EA = 0, S1PTW = 0
[ 1360.245716] Data abort info:
[ 1360.245717] ISV = 0, ISS = 0x00000004
[ 1360.245717] CM = 0, WnR = 0
[ 1360.245720] user pgtable: 4k pages, 48-bit VAs, pgdp=000000013ba65000
[ 1360.245721] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000
[ 1360.245727] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1360.245730] Modules linked in: nvidia_modeset(O) nf_conntrack_netlink nfnetlt
[ 1360.245797] snd_soc_tegra210_ahub nvidia(O) spi_tegra114 binfmt_misc nvmap ]
[ 1360.245811] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.17
[ 1360.245812] Hardware name: Unknown NVIDIA Orin NX Developer Kit/NVIDIA Orin 3
[ 1360.245815] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
[ 1360.245826] pc : tegra_uart_rx_buffer_push+0x38/0x190
[ 1360.245828] lr : tegra_uart_rx_buffer_push+0x30/0x190
[ 1360.245828] sp : ffff800010003570
[ 1360.245829] x29: ffff800010003570 x28: 0000000000000002
[ 1360.245831] x27: 0000000000000005 x26: ffff4157c366f1b0
[ 1360.245832] x25: ffffffffffffffca x24: 0000000000000001
[ 1360.245834] x23: 0000000000000005 x22: ffff4157c366f1b0
[ 1360.245835] x21: ffff41580094ec00 x20: 0000000000000fa0
[ 1360.245837] x19: ffff4157c6838c80 x18: 0000000000000010
[ 1360.245838] x17: 0000000000000000 x16: ffffb3f323225220
[ 1360.245840] x15: ffffb3f324f22bf0 x14: ffffffffffffffff
[ 1360.245841] x13: ffff800090003917 x12: ffff80001000391f
[ 1360.245843] x11: 0000000000000040 x10: ffffb3f324fa7b60
[ 1360.245845] x9 : ffffb3f324fa7b58 x8 : ffff4157c0400b90
[ 1360.245846] x7 : 0000000000000000 x6 : 0000000a01d9cfd1
[ 1360.245847] x5 : ffff4157c682e088 x4 : ffff415b2e796140
[ 1360.245849] x3 : 0000000000000000 x2 : ffffb3f3233c0d70
[ 1360.245851] x1 : 0000000000000000 x0 : ffff41580094ec00
[ 1360.245853] Call trace:
[ 1360.245855] tegra_uart_rx_buffer_push+0x38/0x190
[ 1360.245857] tegra_uart_terminate_rx_dma+0x84/0xe0
[ 1360.245859] tegra_uart_isr+0x41c/0x4a0
[ 1360.245865] __handle_irq_event_percpu+0x68/0x2a0
[ 1360.245867] handle_irq_event_percpu+0x40/0xa0
[ 1360.245869] handle_irq_event+0x50/0xf0
[ 1360.245871] handle_fasteoi_irq+0xc0/0x170
[ 1360.245873] generic_handle_irq+0x40/0x60
[ 1360.245875] __handle_domain_irq+0x70/0xd0
[ 1360.245878] gic_handle_irq+0x68/0x134
[ 1360.245879] el1_irq+0xd0/0x180
[ 1360.245881] console_unlock+0x36c/0x540
[ 1360.245883] vprintk_emit+0x124/0x2a0
[ 1360.245887] dev_vprintk_emit+0x154/0x184
[ 1360.245888] dev_printk_emit+0x80/0xa8
[ 1360.245889] __dev_printk+0x7c/0xa4
[ 1360.245890] _dev_err+0x74/0x9c
[ 1360.245892] tegra_uart_start_rx_dma+0x128/0x140
[ 1360.245893] tegra_uart_rx_error_handle_timer+0xe4/0xf0
[ 1360.245896] call_timer_fn+0x3c/0x200
[ 1360.245897] run_timer_softirq+0x50c/0x5e0
[ 1360.245898] __do_softirq+0x140/0x3e8
[ 1360.245901] irq_exit+0xc0/0xe0
[ 1360.245903] __handle_domain_irq+0x74/0xd0
[ 1360.245903] gic_handle_irq+0x68/0x134
[ 1360.245904] el1_irq+0xd0/0x180
[ 1360.245909] cpuidle_enter_state+0xb8/0x410
[ 1360.245911] cpuidle_enter+0x40/0x60
[ 1360.245913] call_cpuidle+0x44/0x80
[ 1360.245914] do_idle+0x208/0x270
[ 1360.245915] cpu_startup_entry+0x30/0x70
[ 1360.245918] rest_init+0xdc/0xe8
[ 1360.245922] arch_call_rest_init+0x18/0x20
[ 1360.245924] start_kernel+0x500/0x538
[ 1360.245928] Code: aa1603e0 97ff1e09 f9413a61 aa0003f5 (b9400420)
[ 1360.245937] ---[ end trace 7f5703c452e99bb1 ]---
[ 1360.250143] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 1360.250149] SMP: stopping secondary CPUs
[ 1360.250156] Kernel Offset: 0x33f313200000 from 0xffff800010000000
[ 1360.250157] PHYS_OFFSET: 0xffffbea940000000
[ 1360.250160] CPU features: 0x08040006,4a80aa38
[ 1360.250162] Memory Limit: none
[ 1360.713147] ---[ end Kernel panic - not syncing: Oops: Fatal exception in in-
�'
The jetpack version is shown below
nvidia@orinnx32:~$ sudo apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 5.1.2-b104
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1.2-b104), nvidia-jetpack-dev (= 5.1.2-b104)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.1.2-b104_arm64.deb
Size: 29304
SHA256: fda2eed24747319ccd9fee9a8548c0e5dd52812363877ebe90e223b5a6e7e827
SHA1: 78c7d9e02490f96f8fbd5a091c8bef280b03ae84
MD5sum: 6be522b5542ab2af5dcf62837b34a5f0
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8
cat /etc/nv_tegra_release
# R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023
# OS Version
Ubuntu 20.04.6 LTS (GNU/Linux 5.10.120 aarch64)
We have noticed that reducing the size of the message or increasing the baud rate seems to mitigate this issue, but the serial log errors still occur sometimes which reduces our confidence that it’s a permanent fix to up the baud rate.
We are also reaching out to Aetina about this issue.