在JP7.0版本和JP7.1的版本基础上使用thor agx运行rt kernel会出现卡死的情况;在不做任何更改的情况下,刷写rt kernel镜像后启动进入系统,使用cpu压测工具stress进行压测,使用HDMI连接了显示器接口,在5分钟后显示器息屏时会出现如图所示Xorg占用100%的情况,1分钟左右会导致系统重启。
Hi,
Do you observe it on AGX Thor developer kit? Please also attach dmesg or uart log for reference.
以上问题时在thor开发套件中出现的。
[ 6974.831829] nvethernet a808a10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[ 6975.887829] nvethernet a808a10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[ 6979.059829] nvethernet a808a10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[ 6986.451830] nvethernet a808a10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[ 7033.423830] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[ 7066.849815] rcu: INFO: rcu_sched self-detected stall on CPU
[ 7066.849819] rcu: 0-…: (5247 ticks this GP) idle=0744/1/0x4000000000000000 softirq=52553/52553 fqs=2099
[ 7066.849822] rcu: (t=5250 jiffies g=119649 q=911 ncpus=14)
[ 7066.849825] CPU: 0 PID: 2987 Comm: Xorg Tainted: G O 6.8.12-rt-tegra #1
[ 7066.849827] Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS 38.4.0-gcid-43443517 12/30/2025
[ 7066.849827] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=–)
[ 7066.849829] pc : format_decode+0xec/0x598
[ 7066.849837] lr : vsnprintf+0x7c/0x6e0
[ 7066.849839] sp : ffff80008d46ad40
[ 7066.849840] x29: ffff80008d46ad50 x28: 0000000000000800 x27: 00000000ffffffe8
[ 7066.849843] x26: ffff80008d46aeb0 x25: 0000000000000064 x24: ffff80008d46adf4
[ 7066.849844] x23: ffff80008d46aeb0 x22: ffff80008d46ae58 x21: ffffc95cd4a6ad68
[ 7066.849846] x20: ffff80008d46adf4 x19: ffffc95cd4a6e638 x18: ffffffffffffffff
[ 7066.849847] x17: ffff80008d46af6c x16: ffffc95d5092f170 x15: ffff80008d46acc0
[ 7066.849849] x14: ffff80008d46ae58 x13: 2e676e6979727465 x12: 72203a6465747075
[ 7066.849850] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[ 7066.849852] x8 : ffff80008d46ae58 x7 : 0000000000000000 x6 : 692074696177205d
[ 7066.849853] x5 : 00000000ffffffe8 x4 : ffffc95cd4a6e640 x3 : ffffc95cd4a6e640
[ 7066.849854] x2 : 0000000000000025 x1 : ffff80008d46ad40 x0 : 0000000000000008
[ 7066.849856] Call trace:
[ 7066.849857] format_decode+0xec/0x598
[ 7066.849859] dce_os_log_msg+0x8c/0x124 [tegra_dce]
[ 7066.849869] dce_client_ipc_wait+0xc0/0x188 [tegra_dce]
[ 7066.849873] dce_ipc_send_message_sync+0x90/0x288 [tegra_dce]
[ 7066.849877] tegra_dce_client_ipc_send_recv+0x94/0x1d0 [tegra_dce]
[ 7066.849881] nv_tegra_dce_client_ipc_send_recv+0x38/0x64 [nvidia]
[ 7066.850115] dceclientSendRpc_IMPL+0x64/0xe0 [nvidia]
[ 7066.850329] _dceRpcIssueAndWait.isra.0+0x80/0x100 [nvidia]
[ 7066.850521] rpcRmApiControl_dce+0xc8/0x1b0 [nvidia]
[ 7066.850704] rmresControl_Prologue_IMPL+0xb4/0x1c0 [nvidia]
[ 7066.850893] resControl_IMPL+0xec/0x1d0 [nvidia]
[ 7066.851077] serverControl+0x3b8/0x4a0 [nvidia]
[ 7066.851255] _rmapiRmControl+0x474/0x6a0 [nvidia]
[ 7066.851433] rmapiControlWithSecInfo+0xa8/0x150 [nvidia]
[ 7066.851610] rmapiControlWithSecInfoTls+0x74/0xe0 [nvidia]
[ 7066.851785] _nv04ControlWithSecInfo.constprop.0+0x80/0xa0 [nvidia]
[ 7066.851959] Nv04ControlKernel+0x50/0x60 [nvidia]
[ 7066.852131] nvkms_call_rm+0x58/0x94 [nvidia_modeset]
[ 7066.852177] nvRmApiControl+0x50/0x70 [nvidia_modeset]
[ 7066.852214] __arm64_sys_ioctl+0xac/0xf0
[ 7066.852220] invoke_syscall+0x48/0x114
[ 7066.852225] el0_svc_common.constprop.0+0xc0/0xe0
[ 7066.852227] do_el0_svc+0x1c/0x28
[ 7066.852229] el0_svc+0x30/0xa8
[ 7066.852232] el0t_64_sync_handler+0x120/0x12c
[ 7066.852234] el0t_64_sync+0x194/0x198
[ 7076.719829] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
以上这些dmesg中dump出的log
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
INFO: END TASK:MB▒▒
INFO: enter idle task.
▒▒[ 7066.849815] rcu: INFO: rcu_sched self-detected stall on CPU
[ 7066.849819] rcu: 0-…: (5247 ticks this GP) idle=0744/1/0x4000000000000000 softirq=52553/52553 fqs=2099
[ 7066.849822] rcu: (t=5250 jiffies g=119649 q=911 ncpus=14)
[ 7129.861815] rcu: INFO: rcu_sched self-detected stall on CPU
[ 7129.861817] rcu: 0-…: (20999 ticks this GP) idle=0744/1/0x4000000000000000 softirq=52553/52553 fqs=8225
[ 7129.861820] rcu: (t=21003 jiffies g=119649 q=2967 ncpus=14)
[ 7192.873815] rcu: INFO: rcu_sched self-detected stall on CPU
[ 7192.873817] rcu: 0-…: (36751 ticks this GP) idle=0744/1/0x4000000000000000 softirq=52553/52553 fqs=14451
[ 7192.873819] rcu: (t=36756 jiffies g=119649 q=4949 ncpus=14)
以上这些时debug uart出问题的log
Hi,
Please apply the patch and rebuilt RT kernel:
diff --git a/scripts/build_src_release/generic_rt_build.sh b/scripts/build_src_release/generic_rt_build.sh
index 4f8ba12..3647e8a 100755
--- a/source/generic_rt_build.sh
+++ b/source/generic_rt_build.sh
@@ -1,6 +1,6 @@
#!/bin/bash
-# SPDX-FileCopyrightText: Copyright (c) 2017-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2017-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: GPL-2.0-only
#
# This program is free software; you can redistribute it and/or modify it
@@ -62,7 +62,10 @@
--enable PREEMPT_RT --disable DEBUG_PREEMPT\
--disable KVM\
--enable EMBEDDED\
+ --enable EXPERT\
--enable NAMESPACES\
+ --enable OSNOISE_TRACER\
+ --enable TIMERLAT_TRACER\
--disable CPU_IDLE_TEGRA18X\
--disable CPU_FREQ_GOV_INTERACTIVE\
--disable CPU_FREQ_TIMES \
@@ -72,7 +75,10 @@
--enable PREEMPT_RT --disable DEBUG_PREEMPT\
--disable KVM\
--enable EMBEDDED\
+ --enable EXPERT\
--enable NAMESPACES\
+ --enable OSNOISE_TRACER\
+ --enable TIMERLAT_TRACER\
--disable CPU_IDLE_TEGRA18X\
--disable CPU_FREQ_GOV_INTERACTIVE\
--disable CPU_FREQ_TIMES \
And see if the issue is still present.
按照上述操作后发现还是存在卡死的问题。
Hi,
So you can boot to Ubuntu desktop and it works fine. Only when running CPU stress and all CPU cores are at full loading, it hits the RCU stall issue. Is this understanding correct?
是这样的,相同的我使用第三方的板卡,使用rt kernel并增加了相机驱动,开机的时候也可能会偶发出现无法进入ubuntu桌面的情况。由于thor的官方套件没有csi接口所以目前现象只是CPU压测满载时出现RCU卡顿。
