Shutdown now error on GUI interface

Jetson AGX Orin+JetPack 5.1.2

question description:
On Jetson Orin (Ubuntu 20.04, JetPack 5.1.2), the system fails to shut down cleanly from the graphical interface. After issuing a shutdown command, the console stalls for about one minute and then prints RCU-related warnings and stack traces.
From the kernel logs, multiple tasks (sugov, swapoff, cleanup_net) are stuck in synchronize_rcu(). The traces typically look like:

tegra194_cpufreq_set_target → icc_set_bw → mutex_lock → … → RCU

swapoff → synchronize_rcu

cleanup_net → synchronize_rcu

This suggests that the system hang is caused by an RCU grace period that never completes. As a result, any subsystem that calls synchronize_rcu() during shutdown (cpufreq, interconnect bandwidth control, swapoff, network cleanup) gets permanently stuck.

Analysis

The root cause appears to be a kernel-level RCU + cpufreq/ICC + swap interaction bug on the Tegra194/Orin platform, rather than a direct failure of a single external peripheral driver (Wi-Fi, Bluetooth, Ethernet).

External devices (e.g., netdev cleanup) may exacerbate the issue but are not the primary cause; even without loading certain Wi-Fi/BT modules, the RCU stalls still occur.

Disabling swap or locking the CPU governor to performance can sometimes mitigate the issue, but this is not a true fix.

Request for Guidance

Could NVIDIA please confirm:

Is this a known shutdown hang/RCU bug on JetPack 5.1.2 (L4T kernel for Orin)?

Is there an upstream patch, BSP update, or kernel workaround available to resolve this issue?

Are there recommended mitigations (e.g., disabling swap, using a specific CPU governor, disabling ICC paths) until an official fix is provided?

System details:

Hardware: Jetson Orin

OS: Ubuntu 20.04

JetPack: 5.1.2

Kernel: 5.10.120-rt70-tegra

error log:
[ 423.165816] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 423.165826] rcu: 0-…: (5371 ticks this GP) idle=ab6/1/0x4000000000000002 softirq=8944/8945 fqs=2013
[ 423.165833] (t=5250 jiffies g=13105 q=6037)
[ 423.165836] Task dump for CPU 0:
[ 423.165838] task:migration/0 state:R running task stack: 0 pid: 14 ppid: 2 flags:0x0000002a
[ 423.165845] Call trace:
[ 423.165846] dump_backtrace+0x0/0x1d0
[ 423.165859] show_stack+0x30/0x40
[ 423.165863] sched_show_task+0x148/0x170
[ 423.165871] dump_cpu_task+0x4c/0x58
[ 423.165879] rcu_dump_cpu_stacks+0xb8/0xf4
[ 423.165882] rcu_sched_clock_irq+0xb14/0xec0
[ 423.165888] update_process_times+0x68/0xa0
[ 423.165893] tick_sched_handle.isra.0+0x38/0x70
[ 423.165897] tick_sched_timer+0x54/0xb0
[ 423.165899] __hrtimer_run_queues+0x148/0x360
[ 423.165902] hrtimer_interrupt+0xf0/0x250
[ 423.165906] arch_timer_handler_phys+0x40/0x50
[ 423.165913] handle_percpu_devid_irq+0x90/0x280
[ 423.165917] generic_handle_irq+0x40/0x60
[ 423.165920] __handle_domain_irq+0x70/0xd0
[ 423.165922] gic_handle_irq+0x68/0x134
[ 423.165924] el1_irq+0xd0/0x1c0
[ 423.165926] _raw_spin_unlock_irq+0x2c/0x70
[ 423.165930] __schedule.constprop.0+0x814/0x8c0
[ 423.165934] schedule+0x8c/0x100
[ 423.165937] smpboot_thread_fn+0x25c/0x270
[ 423.165941] kthread+0x16c/0x190
[ 423.165947] ret_from_fork+0x10/0x24
[ 423.165950] rcu: ====For debug only: Start Printing Blocked Tasks====<print_cpu_stall>
[ 423.166010] task:kworker/u24:1 state:D stack: 0 pid: 128 ppid: 2 flags:0x00000028
[ 423.166017] Workqueue: netns cleanup_net
[ 423.166024] Call trace:
[ 423.166024] __switch_to+0xc8/0x120
[ 423.166028] __schedule.constprop.0+0x320/0x8c0
[ 423.166031] schedule+0x8c/0x100
[ 423.166035] schedule_timeout+0x2c0/0x320
[ 423.166036] wait_for_completion+0x8c/0x120
[ 423.166040] __wait_rcu_gp+0x184/0x190
[ 423.166044] synchronize_rcu+0x8c/0xa0
[ 423.166047] cleanup_net+0x218/0x390
[ 423.166048] process_one_work+0x1c4/0x490
[ 423.166051] worker_thread+0x54/0x430
[ 423.166053] kthread+0x16c/0x190
[ 423.166056] ret_from_fork+0x10/0x24
[ 423.166132] task:sugov:8 state:D stack: 0 pid: 543 ppid: 2 flags:0x00000028
[ 423.166136] Call trace:
[ 423.166136] __switch_to+0xc8/0x120
[ 423.166139] __schedule.constprop.0+0x320/0x8c0
[ 423.166143] schedule+0x8c/0x100
[ 423.166146] schedule_preempt_disabled+0x2c/0x50
[ 423.166149] __mutex_lock.isra.0+0x18c/0x560
[ 423.166153] __mutex_lock_slowpath+0x28/0x40
[ 423.166157] mutex_lock+0x60/0x70
[ 423.166160] icc_set_bw+0x54/0x2d0
[ 423.166165] tegra194_cpufreq_set_target+0x120/0x150
[ 423.166172] __cpufreq_driver_target+0x1b0/0x5c0
[ 423.166175] sugov_work+0x64/0x80
[ 423.166178] kthread_worker_fn+0xa0/0x170
[ 423.166181] kthread+0x16c/0x190
[ 423.166184] ret_from_fork+0x10/0x24
[ 423.166187] task:sugov:4 state:D stack: 0 pid: 547 ppid: 2 flags:0x00000028
[ 423.166191] Call trace:
[ 423.166191] __switch_to+0xc8/0x120
[ 423.166195] __schedule.constprop.0+0x320/0x8c0
[ 423.166198] schedule+0x8c/0x100
[ 423.166201] schedule_preempt_disabled+0x2c/0x50
[ 423.166204] __mutex_lock.isra.0+0x18c/0x560
[ 423.166207] __mutex_lock_slowpath+0x28/0x40
[ 423.166211] mutex_lock+0x60/0x70
[ 423.166214] icc_set_bw+0x54/0x2d0
[ 423.166216] tegra194_cpufreq_set_target+0x120/0x150
[ 423.166219] __cpufreq_driver_target+0x1b0/0x5c0
[ 423.166222] sugov_work+0x64/0x80
[ 423.166223] kthread_worker_fn+0xa0/0x170
[ 423.166226] kthread+0x16c/0x190
[ 423.166229] ret_from_fork+0x10/0x24
[ 423.166258] task:swapoff state:D stack: 0 pid: 3625 ppid: 1 flags:0x00000000
[ 423.166261] Call trace:
[ 423.166261] __switch_to+0xc8/0x120
[ 423.166264] __schedule.constprop.0+0x320/0x8c0
[ 423.166267] schedule+0x8c/0x100
[ 423.166270] schedule_timeout+0x2c0/0x320
[ 423.166272] wait_for_completion+0x8c/0x120
[ 423.166276] __wait_rcu_gp+0x184/0x190
[ 423.166278] synchronize_rcu+0x8c/0xa0
[ 423.166281] __arm64_sys_swapoff+0x230/0x650
[ 423.166286] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166292] do_el0_svc+0x38/0xb0
[ 423.166295] el0_svc+0x1c/0x30
[ 423.166299] el0_sync_handler+0xa8/0xb0
[ 423.166302] el0_sync+0x16c/0x180
[ 423.166305] task:swapoff state:D stack: 0 pid: 3626 ppid: 1 flags:0x00000000
[ 423.166308] Call trace:
[ 423.166308] __switch_to+0xc8/0x120
[ 423.166312] __schedule.constprop.0+0x320/0x8c0
[ 423.166315] schedule+0x8c/0x100
[ 423.166318] schedule_timeout+0x2c0/0x320
[ 423.166319] wait_for_completion+0x8c/0x120
[ 423.166323] __wait_rcu_gp+0x184/0x190
[ 423.166325] synchronize_rcu+0x8c/0xa0
[ 423.166328] __arm64_sys_swapoff+0x230/0x650
[ 423.166329] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166333] do_el0_svc+0x38/0xb0
[ 423.166336] el0_svc+0x1c/0x30
[ 423.166339] el0_sync_handler+0xa8/0xb0
[ 423.166342] el0_sync+0x16c/0x180
[ 423.166345] task:swapoff state:D stack: 0 pid: 3627 ppid: 1 flags:0x00000000
[ 423.166348] Call trace:
[ 423.166348] __switch_to+0xc8/0x120
[ 423.166352] __schedule.constprop.0+0x320/0x8c0
[ 423.166355] schedule+0x8c/0x100
[ 423.166358] schedule_timeout+0x2c0/0x320
[ 423.166359] wait_for_completion+0x8c/0x120
[ 423.166362] __wait_rcu_gp+0x184/0x190
[ 423.166365] synchronize_rcu+0x8c/0xa0
[ 423.166367] __arm64_sys_swapoff+0x230/0x650
[ 423.166369] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166372] do_el0_svc+0x38/0xb0
[ 423.166376] el0_svc+0x1c/0x30
[ 423.166379] el0_sync_handler+0xa8/0xb0
[ 423.166381] el0_sync+0x16c/0x180
[ 423.166384] task:swapoff state:D stack: 0 pid: 3628 ppid: 1 flags:0x00000000
[ 423.166386] Call trace:
[ 423.166387] __switch_to+0xc8/0x120
[ 423.166390] __schedule.constprop.0+0x320/0x8c0
[ 423.166393] schedule+0x8c/0x100
[ 423.166396] schedule_timeout+0x2c0/0x320
[ 423.166397] wait_for_completion+0x8c/0x120
[ 423.166400] __wait_rcu_gp+0x184/0x190
[ 423.166403] synchronize_rcu+0x8c/0xa0
[ 423.166405] __arm64_sys_swapoff+0x230/0x650
[ 423.166407] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166410] do_el0_svc+0x38/0xb0
[ 423.166413] el0_svc+0x1c/0x30
[ 423.166416] el0_sync_handler+0xa8/0xb0
[ 423.166418] el0_sync+0x16c/0x180
[ 423.166421] task:swapoff state:D stack: 0 pid: 3629 ppid: 1 flags:0x00000000
[ 423.166423] Call trace:
[ 423.166424] __switch_to+0xc8/0x120
[ 423.166427] __schedule.constprop.0+0x320/0x8c0
[ 423.166430] schedule+0x8c/0x100
[ 423.166433] schedule_timeout+0x2c0/0x320
[ 423.166435] wait_for_completion+0x8c/0x120
[ 423.166438] __wait_rcu_gp+0x184/0x190
[ 423.166440] synchronize_rcu+0x8c/0xa0
[ 423.166442] __arm64_sys_swapoff+0x230/0x650
[ 423.166444] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166447] do_el0_svc+0x38/0xb0
[ 423.166450] el0_svc+0x1c/0x30
[ 423.166453] el0_sync_handler+0xa8/0xb0
[ 423.166456] el0_sync+0x16c/0x180
[ 423.166460] task:swapoff state:D stack: 0 pid: 3630 ppid: 1 flags:0x00000000
[ 423.166462] Call trace:
[ 423.166463] __switch_to+0xc8/0x120
[ 423.166467] __schedule.constprop.0+0x320/0x8c0
[ 423.166470] schedule+0x8c/0x100
[ 423.166473] schedule_timeout+0x2c0/0x320
[ 423.166474] wait_for_completion+0x8c/0x120
[ 423.166477] __wait_rcu_gp+0x184/0x190
[ 423.166480] synchronize_rcu+0x8c/0xa0
[ 423.166482] __arm64_sys_swapoff+0x230/0x650
[ 423.166484] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166487] do_el0_svc+0x38/0xb0
[ 423.166491] el0_svc+0x1c/0x30
[ 423.166493] el0_sync_handler+0xa8/0xb0
[ 423.166496] el0_sync+0x16c/0x180
[ 423.166499] task:swapoff state:D stack: 0 pid: 3631 ppid: 1 flags:0x00000000
[ 423.166501] Call trace:
[ 423.166502] __switch_to+0xc8/0x120
[ 423.166505] __schedule.constprop.0+0x320/0x8c0
[ 423.166508] schedule+0x8c/0x100
[ 423.166512] schedule_timeout+0x2c0/0x320
[ 423.166513] wait_for_completion+0x8c/0x120
[ 423.166516] __wait_rcu_gp+0x184/0x190
[ 423.166519] synchronize_rcu+0x8c/0xa0
[ 423.166521] __arm64_sys_swapoff+0x230/0x650
[ 423.166523] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166526] do_el0_svc+0x38/0xb0
[ 423.166529] el0_svc+0x1c/0x30
[ 423.166531] el0_sync_handler+0xa8/0xb0
[ 423.166534] el0_sync+0x16c/0x180
[ 423.166537] task:swapoff state:D stack: 0 pid: 3632 ppid: 1 flags:0x00000000
[ 423.166539] Call trace:
[ 423.166539] __switch_to+0xc8/0x120
[ 423.166543] __schedule.constprop.0+0x320/0x8c0
[ 423.166546] schedule+0x8c/0x100
[ 423.166549] schedule_timeout+0x2c0/0x320
[ 423.166551] wait_for_completion+0x8c/0x120
[ 423.166554] __wait_rcu_gp+0x184/0x190
[ 423.166556] synchronize_rcu+0x8c/0xa0
[ 423.166558] __arm64_sys_swapoff+0x230/0x650
[ 423.166560] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166563] do_el0_svc+0x38/0xb0
[ 423.166566] el0_svc+0x1c/0x30
[ 423.166569] el0_sync_handler+0xa8/0xb0
[ 423.166571] el0_sync+0x16c/0x180
[ 423.166575] task:swapoff state:D stack: 0 pid: 3633 ppid: 1 flags:0x00000000
[ 423.166576] Call trace:
[ 423.166577] __switch_to+0xc8/0x120
[ 423.166580] __schedule.constprop.0+0x320/0x8c0
[ 423.166583] schedule+0x8c/0x100
[ 423.166586] schedule_timeout+0x2c0/0x320
[ 423.166588] wait_for_completion+0x8c/0x120
[ 423.166591] __wait_rcu_gp+0x184/0x190
[ 423.166593] synchronize_rcu+0x8c/0xa0
[ 423.166595] __arm64_sys_swapoff+0x230/0x650
[ 423.166597] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166601] do_el0_svc+0x38/0xb0
[ 423.166604] el0_svc+0x1c/0x30
[ 423.166606] el0_sync_handler+0xa8/0xb0
[ 423.166609] el0_sync+0x16c/0x180
[ 423.166613] task:swapoff state:D stack: 0 pid: 3634 ppid: 1 flags:0x00000000
[ 423.166615] Call trace:
[ 423.166615] __switch_to+0xc8/0x120
[ 423.166619] __schedule.constprop.0+0x320/0x8c0
[ 423.166622] schedule+0x8c/0x100
[ 423.166625] schedule_timeout+0x2c0/0x320
[ 423.166626] wait_for_completion+0x8c/0x120
[ 423.166630] __wait_rcu_gp+0x184/0x190
[ 423.166632] synchronize_rcu+0x8c/0xa0
[ 423.166634] __arm64_sys_swapoff+0x230/0x650
[ 423.166636] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166639] do_el0_svc+0x38/0xb0
[ 423.166642] el0_svc+0x1c/0x30
[ 423.166645] el0_sync_handler+0xa8/0xb0
[ 423.166647] el0_sync+0x16c/0x180
[ 423.166650] task:swapoff state:D stack: 0 pid: 3635 ppid: 1 flags:0x00000000
[ 423.166652] Call trace:
[ 423.166652] __switch_to+0xc8/0x120
[ 423.166656] __schedule.constprop.0+0x320/0x8c0
[ 423.166659] schedule+0x8c/0x100
[ 423.166662] schedule_timeout+0x2c0/0x320
[ 423.166663] wait_for_completion+0x8c/0x120
[ 423.166666] __wait_rcu_gp+0x184/0x190
[ 423.166669] synchronize_rcu+0x8c/0xa0
[ 423.166671] __arm64_sys_swapoff+0x230/0x650
[ 423.166673] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166676] do_el0_svc+0x38/0xb0
[ 423.166679] el0_svc+0x1c/0x30
[ 423.166682] el0_sync_handler+0xa8/0xb0
[ 423.166684] el0_sync+0x16c/0x180
[ 423.166688] task:swapoff state:D stack: 0 pid: 3636 ppid: 1 flags:0x00000000
[ 423.166690] Call trace:
[ 423.166690] __switch_to+0xc8/0x120
[ 423.166693] __schedule.constprop.0+0x320/0x8c0
[ 423.166696] schedule+0x8c/0x100
[ 423.166699] schedule_timeout+0x2c0/0x320
[ 423.166701] wait_for_completion+0x8c/0x120
[ 423.166704] __wait_rcu_gp+0x184/0x190
[ 423.166706] synchronize_rcu+0x8c/0xa0
[ 423.166708] __arm64_sys_swapoff+0x230/0x650
[ 423.166710] el0_svc_common.constprop.0+0x80/0x1d0
[ 423.166714] do_el0_svc+0x38/0xb0
[ 423.166717] el0_svc+0x1c/0x30
[ 423.166720] el0_sync_handler+0xa8/0xb0
[ 423.166722] el0_sync+0x16c/0x180
[ 423.166725] rcu: ====For debug only: End Printing Blocked Tasks====<print_cpu_stall>

Do you have any I2C device on your Jetson? Is this issue only happneed with RT enabled?

BTW, Orin is T234 Tegra. T194 is Xavier.

I have already manually disabled the I²C peripherals in the device tree (set their status to “disabled”). However, when I run sudo shutdown now, the system still reports errors.

Previously, the error was an RCU warning, but after disabling the I²C peripherals, it changed to a timeout error.

I also tried switching from the RT (real-time) kernel to the non-RT kernel. In this case, sudo shutdown now still triggers errors. The error message is again timeout related, not RCU.

In summary:

With I²C peripherals disabled, I get timeout errors.

With the non-RT kernel, I also get the same timeout errors.

The timeout errors are consistent across both conditions.

The detailed timeout error log is as follows:

[ 59.171614] WARNING: CPU: 5 PID: 1 at kernel/workqueue.c:3047 __flush_work.isra.0+0x20c/0x220
[ 59.181571] —[ end trace 553c6becc12cf778 ]—
[ 60.552824] CPU:0, Error: cbb-fabric@0x13a00000, irq=33
[ 60.559326] **************************************
[ 60.565343] CPU:0, Error:cbb-fabric@0x13a00000, irq=33
[ 60.559326] **************************************
[ 60.565343] CPU:0, Error:cbb-fabric, Errmon:2
[ 60.570903] Error Code : TIMEOUT_ERR
[ 60.576014] Overflow : Multiple TIMEOUT_ERR
[ 60.581771]
[ 60.584366] Error Code: TIMEOUT_ERR

[ 60.589471] MASTER_ID : CCPLEX
[ 60.594036] Address : 0x3a080082
[ 60.598795] Cache : 0x1 – Bufferable
[ 60.604180] Protection : 0x2 – Unprivileged, Non-Secure, Data Access
[ 60.612266] Access_Type : Read
[ 60.616873] Access_ID : 0x12
[ 60.616876] Fabric : cbb-fabric
[ 60.625962] Slave_Id : 0x16
[ 60.630262] Burst_length : 0x0
[ 60.634817] Burst_type : 0x1
[ 60.639190] Beat_size : 0x1

[ 60.643493] VQC : 0x0
[ 60.647360] GRPSEC : 0x7e
[ 60.651502] FALCONSEC : 0x0
[ 60.655837] **************************************
[ 60.662011] WARNING: CPU: 0 PID: 211 at drivers/soc/tegra/cbb/tegra234-cbb.c:577 tegra234_cbb_isr+0x130/0x170
[ 60.673838] —[ end trace 553c6becc12cf779 ]—
[ 61.579331] CPU:0, Error: cbb-fabric@0x13a00000, irq=33
[ 61.585939] **************************************
[ 61.592124] CPU:0, Error:cbb-fabric, Errmon:2
[ 61.597881] Error Code : TIMEOUT_ERR
[ 61.603189] Overflow : Multiple TIMEOUT_ERR
[ 61.609126]
[ 61.611909] Error Code : TIMEOUT_ERR
[ 61.617208] MASTER_ID : CCPLEX
[ 61.621940] Address : 0x3a080082
[ 61.626851] Cache : 0x1 – Bufferable
[ 61.632388] Protection : 0x2 – Unprivileged, Non-Secure, Data Access
[ 61.640623] Access_Type : Read
[ 61.645374] Access_ID : 0x13
[ 61.645377] Fabric : cbb-fabric
[ 61.654773] Slave_Id : 0x16
[ 61.659220] Burst_length : 0x0
[ 61.663932] Burst_type : 0x1
[ 61.668466] Beat_size : 0x1
[ 61.672915] VQC : 0x0
[ 61.676889] GRPSEC : 0x7e
[ 61.681156] FALCONSEC : 0x0
[ 61.685598] **********

I have no idea what the actual issue is, but it is running in el0 exception level when it does this. This says it is a user space process failing and not a driver. Perhaps the shutdown of something related to i2c has occurred before the software depending on it has completed.

Here are our test results regarding the shutdown issue:

JetPack 5.1.2, using HDMI-to-VGA adapter with a monitor: tested on 5 devices, a total of 40 shutdown attempts, no abnormal behavior observed.

JetPack 5.1.2, using HDMI directly connected to a monitor: tested on 4 devices, a total of 30 shutdown attempts, 17 times encountered errors (as previously described), CPU hangs and watchdog resets.

JetPack 5.1.2, without any monitor connected: tested 15 times, all shutdowns were successful without issues.

JetPack 6.0, tested 5 times, all shutdowns were successful. In addition, this is the version we are currently using internally, and we have not observed such issues in actual usage.

Could you please advise on possible debugging or troubleshooting directions for this issue?

VGA does not normally provide EDID data. There was a brief period when some adapters had a version 1 EDID, but everything these days uses EDID 2. The assumption here is that HDMI-to-VGA broke automatic configuration and is using a default display mode.

In the other cases the i2c protocol is occurring over the DDC wire, perhaps during boot or after a hot plug event is detected. This is almost always due to a bad device tree configuration (power rails, detect, and the DDC all have to be configured correctly). This shouldn’t be the case on a dev kit with a dev kit carrier board. Other i2c issues might interfere with this even if the DDC is correct in device tree. The el0 in the stack frame suggests this is a user space software issue, that the drivers in kernel space are themselves not part of the issue.

There were probably bugs in earlier L4T R35.x. There have been a couple of releases since then, and so I’m going to suggest:

We have confirmed that the shutdown issue was caused by our HDMI driver submission and is not related to NVIDIA. We sincerely appreciate your support and assistance.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.