OrinAGX 64G module frequently encounters reboot hangs

Test environment: jetpack 5.2.1
Test module: jetpack-Orin-AGX 64G
Test command: reboot

When OrinAGX executes reboot, it prints a call trace, becomes unresponsive and fails to proceed, displays an RCU error, and is then restarted by the watchdog after two minutes.

[2025-07-18 09:20:50] [660914.576990] rcu: INFO: rcu_preempt self-detected stall on CPU
[2025-07-18 09:21:14] [660914.576996] rcu: 0-…: (5241 ticks this GP) idle=e36/1/0x4000000000000004 softirq=6119522/6119522 fqs=1894
[2025-07-18 09:21:14] [660914.577002] (t=5251 jiffies g=17710697 q=4268)
[2025-07-18 09:21:14] [660914.577005] Task dump for CPU 0:
[2025-07-18 09:21:14] [660914.577007] task:swapper/0 state:R running task stack: 0 pid: 0 ppid: 0 flags:0x0000002a
[2025-07-18 09:21:14] [660914.577013] Call trace:
[2025-07-18 09:21:14] [660914.577014] dump_backtrace+0x0/0x1d0
[2025-07-18 09:21:14] [660914.577026] show_stack+0x30/0x40
[2025-07-18 09:21:14] [660914.577030] sched_show_task+0x148/0x170
[2025-07-18 09:21:14] [660914.577036] dump_cpu_task+0x4c/0x58
[2025-07-18 09:21:14] [660914.577041] rcu_dump_cpu_stacks+0xb8/0xf4
[2025-07-18 09:21:14] [660914.577044] rcu_sched_clock_irq+0xb14/0xec0
[2025-07-18 09:21:14] [660914.577049] update_process_times+0x68/0xa0
[2025-07-18 09:21:14] [660914.577053] tick_sched_handle.isra.0+0x38/0x70
[2025-07-18 09:21:14] [660914.577056] tick_sched_timer+0x54/0xb0
[2025-07-18 09:21:14] [660914.577057] __hrtimer_run_queues+0x148/0x360
[2025-07-18 09:21:14] [660914.577060] hrtimer_interrupt+0xf0/0x250
[2025-07-18 09:21:14] [660914.577062] arch_timer_handler_phys+0x40/0x50
[2025-07-18 09:21:14] [660914.577068] handle_percpu_devid_irq+0x90/0x280
[2025-07-18 09:21:14] [660914.577071] generic_handle_irq+0x40/0x60
[2025-07-18 09:21:14] [660914.577073] __handle_domain_irq+0x70/0xd0
[2025-07-18 09:21:14] [660914.577074] gic_handle_irq+0x68/0x134
[2025-07-18 09:21:14] [660914.577077] el1_irq+0xd0/0x1c0
[2025-07-18 09:21:14] [660914.577078] cpuidle_enter_state+0xb8/0x430
[2025-07-18 09:21:14] [660914.577081] cpuidle_enter+0x40/0x60
[2025-07-18 09:21:14] [660914.577083] call_cpuidle+0x44/0x80
[2025-07-18 09:21:14] [660914.577086] do_idle+0x208/0x270
[2025-07-18 09:21:14] [660914.577088] cpu_startup_entry+0x30/0x70
[2025-07-18 09:21:14] [660914.577090] rest_init+0xdc/0xe8
[2025-07-18 09:21:14] [660914.577093] arch_call_rest_init+0x18/0x20
[2025-07-18 09:21:14] [660914.577098] start_kernel+0x4f8/0x530
[2025-07-18 09:21:14] [660914.577100] rcu: ====For debug only: Start Printing Blocked

0707版本reboot日志.log (154.5 KB)

Hi DKaiF,

Are you using the devkit or custom board for AGX Orin?

Is this issue 100% reproducible when you run reboot command?

It seems you are using JP5.1.2
Have you verified with the latest Jetpack 5.1.5?
Please also share the results of the following commands on your board.

$ cat /etc/nv_boot_control.conf
$ cat /etc/nv_tegra_release

Hi,KevinFFF
we are using custom board for AGX Orin 64G,JetPack 5.1.2. And we have applied the RT real-time kernel patch.
This issue is not 100% reproducible, but it occurs with a fairly high probability.
When I run the reboot command, under normal conditions,the following LOG is always printed; however, I’m not sure whether the two are necessarily related.

root@ubuntu:~# reboot
[10689.875030] tegra-xudc 3550000.xudc: setup request failed: -22
[10692.737613] watchdog: watchdog0: watchdog did not stop!
[10693.070063] WARNING: CPU: 7 PID: 1 at kernel/workqueue.c:3047 __flush_work.isra.0+0x20c/0x220
[10693.070371] —[ end trace 20b6eb2cf4c400d6 ]—
[10694.533277] arm-smmu 8000000.iommu: disabling translation
[10694.533347] arm-smmu 10000000.iommu: disabling translation
[10694.533377] arm-smmu 12000000.iommu: disabling translation
[10694.533589] reboot: Restarting system
?釹hutdown state requested 1
Rebooting system …
?

Below are the outputs of the commands:

root@ubuntu:~# cat /etc/nv_boot_control.conf
TNSPEC 3701-501-0004-D.0-1-0-jetson-agx-orin-intelligent_control-
COMPATIBLE_SPEC 3701–0004–1–jetson-agx-orin-intelligent_control-
TEGRA_LEGACY_UPDATE false
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_EMMC_ONLY false
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

root@ubuntu:~# cat /etc/nv_tegra_release
R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023
root@ubuntu:~# jetson_release
Software part of jetson-stats 4.3.2 - (c) 2024, Raffaello Bonghi
Model: Jetson AGX Orin Developer Kit - Jetpack 5.1.2 [L4T 35.4.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • P-Number: p3701-0004
  • Module: NVIDIA Jetson AGX Orin (32GB ram)
    Platform:
  • Distribution: Ubuntu 20.04 Focal Fossa
  • Release: 5.10.120-rt70-tegra
    jtop:
  • Version: 4.3.2
  • Service: Active
    Libraries:
  • CUDA: 11.4.315
  • cuDNN: 8.6.0.166
  • TensorRT: 8.5.2.2
  • VPI: 2.3.9
  • Vulkan: 1.3.204
  • OpenCV: 4.5.4 - with CUDA: NO
[2025-07-18 09:21:15]  [660914.577418] task:swapoff         state:D stack:    0 pid:33776 ppid:     1 flags:0x00000000

From the fail log, it is caused from swapoff task.

Please help to check the following cases to clarify the issue.

  1. Could it be reproduced on the devkit?
  2. If so, please also verify with the latest Jetpack 5.1.5(r35.6.2)
  3. Is the issue relating to RT kernel patch? (i.e. please verify if you can reproduce the issue with RT kernel patch not applied)

I have conducted tests on the devkit and found that the aforementioned error log does not appear on the devkit. It can reboot normally.

We have tested on the custom board and found that both the real-time kernel and the non-real-time kernel will report errors upon reboot, and the aforementioned issue still occurs.

Can you provide a direction for troubleshooting? Which module might be causing this?

From the results you shared, it seems the issue is specific to your custom carrier board rather than rt kernel.
To clarify the issue, could you also verify the latest Jetpack 5.1.5(r35.6.2) on your custom carrier board?

Is there more logs before you got rcu related messages?
Or do you run any custom application on your board?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.