Thor realtime kernel hang

Hi NVIDIA,

Our custom board, code version Linux R38.2.1, intermittently experiences kernel hang after enabling real-time patches in the kernel, which may be related to the GPU driver according to the logs.

The monitor only displays the mouse icon and a black background, and the serial port can be used, but it will automatically restart after a period of time when turned on.

Please help to see how to solve it.

[ 52.348520] CPU: 4 PID: 1963 Comm: Xorg.wrap Tainted: G W O 6.8.12-rt-tegra #1 e35859025127624cc06208d684f4718[ 52.348523] Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS r38.2-899cdbc9-dirty 11/12/2025

[ 78.203276] NVRM: GPU at PCI:0000:01:00: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
[ 78.203296] NVRM: Xid (PCI:0000:01:00): 109, pid=2029, name=gnome-shell, channel 0x00000004, errorString CTX SWITCH TIMEOUT,[ 78.210412] NVRM: nvAssertFailedNoLog: Assertion failed: KGSP service called when no KGSP interrupt pending
78.210412] @ kernel_gsp_tu102.c:1121
[ 81.127217] NVRM: Xid (PCI:0000:01:00): 13, pid=2052, name=mutter-x11-fram, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): TEX FORMAT Errors
[ 81.127404] NVRM: Xid (PCI:0000:01:00): 13, pid=2052, name=mutter-x11-fram, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 1): TEX FORMAT Errors
[ 81.139468] NVRM: Xid (PCI:0000:01:00): 13, pid=2052, name=mutter-x11-fram, Graphics SM Warp Exception on (GPC 0, TPC 1, SM 0): TEX FORMAT Errors

1224_realtime_kernel_hang.zip (169.7 KB)

HDMI-display-error.log (971.9 KB)

The regular kernel (without real-time patch enabled) also reproduced the problem, and when reproducing the problem, the screen only had a mouse icon, and both UART and SSH terminals could work.

Hi,
There are display issues discovered on r38.2.1. We are checking it and please wait for next Jetpack 7 release.

Hi DaneLLL,

May I ask if the RT kernel patch and Xorg can work properly after porting the latest code from the Git repository nv-unified-gpu-display-driver.git?

How long will the next version be released?

Thanks.

Hi DaneLLL,

I have merged the jetpack7.1 code, but I found that there are still many errors in the GPU driver.

It’s worth noting that we have modified the DCB file, and I’m not certain if the difference in hardware solutions is the reason behind a series of GPU errors.

Do you have any suggestions?

jetpack7.1_kernel_log_0114.txt (864.6 KB)

Hi,
Please follow this to try again:
R38.4.0 实时内核在接显示器之后 上下电会卡住 - #6 by DaneLLL

We have done validation on AGX Thor developer kit. It is supposed to work fine if you follow the steps one by one.

There are still GPU errors. Do I need to create a separate topic for handling these issues?

NVRM: GPU at PCI:0000:01:00: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
NVRM: Xid (PCI:0000:01:00): 109, pid=2422, name=gnome-shell, channel 0x00000004, errorString CTX SWITCH TIMEOUT, Info 0x5c003

NVRM: nvAssertFailedNoLog: Assertion failed: KGSP service called when no KGSP interrupt pending
 @ kernel_gsp_tu102.c:1121
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): TEX FORMAT Errors
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics SM Global Exception on (GPC 0, TPC 0, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics Exception: ESR 0x505730=0x9000018 0x505734=0x4 0x505728=0x1f81fb60 0x50572c=0x1174
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 1): TEX FORMAT Errors
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics SM Global Exception on (GPC 0, TPC 0, SM 1): Multiple Warp Errors
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics Exception: ESR 0x5057b0=0x9000018 0x5057b4=0x4 0x5057a8=0x1f81fb60 0x5057ac=0x1174
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics SM Warp Exception on (GPC 0, TPC 1, SM 0): TEX FORMAT Errors
NVRM: Xid (PCI:0000:01:00): 13, pid=2483, name=mutter-x11-fram, Graphics SM Global Exception on (GPC 0, TPC 1, SM 0): Multiple Warp Errors

Hi,
Do you observe the issue on AGX Thor developer kit? Or only specific to the custom board? Would like to clarify this is more like a SW issue or HW issue.

The error was reported by our custom board.

Hi,
So it should not be related to RT kernel or non-RT kernel. Please compare your design with AGX Thor developer kit. If there is deviation, you would need to modify device tree and rebuild dtb file. The default DTB file is for developer kit. Please check adaptation guide:

Jetson Thor Adaptation and Bring-Up — NVIDIA Jetson Linux Developer Guide

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.