AGX Xavier self reboot without load

Hi, I’m a laboratory technician of UPV university (Spain). We’ve 7x AGX Xavier for developing. Now we’ve a problem with one of them (bought June’21). Without load the system auto-reboot, randomly. If you execute “dmesg --follow” you can see in the console before the reboot:

[ 1084.572288] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1084.572522] 0-…: (1 GPs behind) idle=c81/2/0 softirq=11518/11537 fqs=145
[ 1084.572669] (detected by 1, t=5754 jiffies, g=3618, c=3617, q=133)
[ 1084.572815] Task dump for CPU 0:
[ 1084.572825] swapper/0 R running task 0 0 0 0x00000002
[ 1084.572842] Call trace:
[ 1084.572877] [] __switch_to+0x9c/0xc0
[ 1084.572896] [] cpuidle_enter_state+0xa0/0x380
[ 1084.572904] [] cpuidle_enter+0x34/0x48
[ 1084.572915] [] call_cpuidle+0x44/0x70
[ 1084.572923] [] cpu_startup_entry+0x1b0/0x200
[ 1084.572937] [] rest_init+0x84/0x90
[ 1084.572956] [] start_kernel+0x370/0x384
[ 1084.572964] [] __primary_switched+0x80/0x94

Info of the system:

Ubuntu 18.04.6 LTS
Linux xavier 4.9.140-tegra #1 SMP PREEMPT Tue Oct 27 21:02:46 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux

cat /etc/nv_tegra_release:

R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020

lshw:

description: Computer
product: Jetson-AGX
serial: 1421921012841

lscpu:

Architecture: aarch64
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 4
Vendor ID: Nvidia
Model: 0
Model name: ARMv8 Processor rev 0 (v8l)
Stepping: 0x0
CPU max MHz: 2265,6001
CPU min MHz: 115,2000
BogoMIPS: 62.50
L1d cache: 64K
L1i cache: 128K
L2 cache: 2048K
L3 cache: 4096K
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp

Please, any help?

Best regards.

hello Ricardo_M,

may I know what’s the kernel modifications you’ve done? it looks like a callstack to force reboot the target.

Hello Jerry,

I haven’t made any changes in the kernel. I’ve installed BBqueue and the network simulator that we use. Also I’ve set the board power mode to the 30W one through the nvpmodel command.

Best regards.

hello Ricardo_M,

please narrow down the issue by removing all those peripheral devices.
thanks

Hello Jerry,

I’ve removed the apps (BBqueue and network simulator) and reverted the power mode to default.

Best regards.

hello Ricardo_M,

do you still see kernel panic to reboot the system?
could you please re-flash the target with native JetPack release (or, moving to the latest release version) for confirmation,
thanks

Hello Jerry,

The system is stable after removing the apps. I’m waiting for a coworker to re-flash the board.

Best regards.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.