Hello,
I’m experiencing frequent full-system freezes on my Jetson AGX Thor Developer Kit, running JetPack 7 installed via SDK Manager.
🔴 Problem Summary
- System randomly freezes, both:
- During normal desktop usage
- And during Docker build operations (which make the issue easier to reproduce)
- In freeze state:
- GUI is completely unresponsive
- Mouse and keyboard stop working
- Requires hard power-off via the power button
- Issue persists even after a clean JetPack reinstallation:
- I re-flashed the device twice with the same JetPack SDK using SDK Manager.
- Both setups showed the exact same problem.
🐞 Errors Seen in Logs (dmesg
)
- NVIDIA driver assertion:
NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
- Realtek WiFi driver (
rtl8852ce
) warnings:
WARNING: CPU: … rtw_scan.c …
🛠️ System Info
- Device: Jetson AGX Thor Developer Kit
- BIOS: 38.2.0-gcid-41844464 (08/22/2025)
- Install Method: NVIDIA SDK Manager
- # of Reinstall Attempts: 2 (clean reflashes)
📌 Notes
- Issue happens randomly even when system is idle or doing basic tasks.
- No high CPU or memory usage when freeze occurs.
journalctl -b -1
shows “no persistent journal”, so logs don’t persist across reboots.
dmesg
consistently shows nvAssertFailed
before or around freeze times.
- WiFi driver (
rtl8852ce
) also shows repeated kernel warnings.
❓ Questions
- Is this a known issue with Jetson AGX Thor or the JetPack SDK?
- Could the GPU or Realtek driver stack be causing the system to lock up?
- Any known workaround or kernel patch available?
- How can I enable persistent journaling for deeper post-mortem debugging?
I’m happy to share full logs, reproduce the issue, or try debugging steps.
Thanks in advance for your help!
I do not yet have a Thor to look at, but if you can log in to the Thor with a serial console before things go bad, and run the command “dmesg --follow
”, then the PC which monitors the serial console can save a log after everything has crashed (serial console essentially logs to the alternate computer, and the other computer isn’t failed). You can also set the serial console to log at the start of the session so everything it sees prior to locking up should be available by posting the serial console log. Hopefully the log shows a stack frame.
You might also mention if there is anything known to be going on with networking at the time, e.g., a web browser is running, a software update or download is running, a multimedia stream is running, so on. Prior to things going bad, while serial console is logging, you might also the command “ip -s addr
” and “ip route
” so the logs see this prior to starting “dmesg --follow
”.
Thanks for the suggestions.
At this point, I haven’t connected via serial console yet — the dmesg --follow
output below was captured directly from the terminal while the system was still responsive. Here’s a snippet of what I see repeatedly:
openzeka@localhost:~$ sudo dmesg --follow
[20304.176120] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20305.229946] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20306.286218] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20307.341958] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20308.398228] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20309.454217] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20309.680250] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[20310.510099] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20311.566105] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20312.621946] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20313.678138] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20314.733931] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20315.790092] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20316.845926] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20317.902103] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20318.957923] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20320.013932] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20321.069927] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20322.126090] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20323.181924] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20324.238078] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20325.294092] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20326.349905] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20327.406067] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20328.461902] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20329.518065] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20330.573912] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20331.632190] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20332.685904] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20333.742222] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20334.193572] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[20334.799902] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
[20335.854091] nvethernet a808e10000.ethernet: [xpcs_lane_bring_up][827][type:0x4][loga-0x0] PCS block lock SUCCESS
These messages continue in a loop, and after some time, the system becomes unresponsive. This behavior is the same whether the system is connected to the internet or completely offline — no browser, no downloads, no updates.
I will try to capture the same output from a serial console next, as you suggested, to ensure logs are preserved even after the system hangs. If you have any insights based on the current log or thoughts about the repeated PCS block lock
and nvAssertFailed
messages, I’d appreciate your input.
Hi all
I have the same problem in Devkit. The operating system occasionally freezes during normal operation, and dmesg displays a large number of errors.
[ 78.948508] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 118.463230] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 137.467584] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 182.264999] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 202.660041] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[225.663493] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 240.174990] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 262.661706] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[282.175939] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 296.683480] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
Since this code is triggered by the unifiedgpudisp driver, I suspected a graphical desktop issue and switched the system to text-only mode. However, the lag persisted in text-only mode. The relevant log is attached.
Thor_Devkit_hang_dmesg.log (129.8 KB)
I don’t have internal information on the Ethernet, but it kind of looks like it might be a race condition or just insufficient device tree setting. I see lots of this or related:
[ 9.571570] -->macsec_probe()
[ 9.572428] nvethernet a808a10000.ethernet: DT info about vlan in clear is missing setting default-disabled
[ 9.572436] -->macsec_get_platform_res()
[ 9.580266] <--macsec_get_platform_res()
[ 9.580273] -->macsec_enable_car()
[ 9.582170] <--macsec_enable_car()
[ 9.582199] <--macsec_probe()
[ 9.582202] nvethernet a808a10000.ethernet: Macsec: Reduced MTU: 1466 Max: 9000
[ 9.627268] nvethernet a808a10000.ethernet: mgbe0_0 (HW ver: 42) created with 4 DMA channels
[ 9.644053] nvethernet a808b10000.ethernet: Adding to iommu group 32
[ 9.647016] nvethernet a808b10000.ethernet: Virtualization is not enabled
[ 9.647025] nvethernet a808b10000.ethernet: failed to read skip mac reset flag, default 0
[ 9.647029] nvethernet a808b10000.ethernet: failed to read MDIO address
[ 9.647038] nvethernet a808b10000.ethernet: Failed to read nvida,pause_frames, so setting to default support as disable
[ 9.647040] nvethernet a808b10000.ethernet: Failed to read nvida,disable-rx-checksum, so setting to default - rx checksum offload enabled
[ 9.647044] nvethernet a808b10000.ethernet: setting to default DMA bit mask
[ 9.660545] nvethernet a808b10000.ethernet: failed to read or invalid MDC CR - default to 5
[ 9.660696] nvethernet a808b10000.ethernet: failed to get phy reset gpio error: -2
[ 9.672493] nvethernet a808b10000.ethernet: Ethernet MAC address: 3c:6d:66:e3:fb:d0
[ 9.672781] nvethernet a808b10000.ethernet: VM IRQ is handled by Camera CPU: 4
This isn’t clearly a particular cause, but failing some of the information related to the above could conceivably be a device tree issue. This goes on to look like maybe it is retrying “fixing” the issue over and over at different device tree physical address style locations (this could be completely unrelated, but it is suspicious).
For reference, here is the actual kernel stack frame and some content just prior to the stack frame:
[ 13.237538] enP2p1s0: 0xffff800086f10000, 3c:6d:66:f9:57:ce, IRQ 317
[ 13.271140] r8126 0002:01:00.0 enP2p1s0: registered PHC device on enP2p1s0
[ 13.271146] r8126 0002:01:00.0 enP2p1s0: reset PHC clock
[ 13.279514] camrtc-coe tegra-capture-coe0: netdev event 14 dev mgbe0_0
[ 13.301813] camrtc-coe tegra-capture-coe0: netdev event 1 dev mgbe0_0
[ 13.301895] camrtc-coe tegra-capture-coe0: netdev event 4 dev mgbe0_0
[ 13.305960] camrtc-coe tegra-capture-coe1: netdev event 14 dev mgbe1_0
[ 13.322107] camrtc-coe tegra-capture-coe1: netdev event 1 dev mgbe1_0
[ 13.322159] camrtc-coe tegra-capture-coe1: netdev event 4 dev mgbe1_0
[ 13.325187] camrtc-coe tegra-capture-coe2: netdev event 14 dev mgbe2_0
[ 13.343300] nv_platform 8808c00000.display: Adding to iommu group 59
[ 13.344692] platform 8808c00000.display:nvdisplay-niso: Adding to iommu group 60
[ 13.346136] NVRM: devm_reset_control_get failed, err: -2
[ 13.346139] NVRM: devm_reset_control_get failed, err: -2
[ 13.346141] NVRM: mipi_cal devm_reset_control_get failed, err: -2
[ 13.346153] ------------[ cut here ]------------
[ 13.346155] WARNING: CPU: 12 PID: 938 at drivers/reset/core.c:766 __reset_control_get_internal+0x68/0x16c
[ 13.346168] Modules linked in: qrtr bridge stp llc usb_f_ncm usb_f_mass_storage nvidia(O+) usb_f_acm u_serial governor_pod_scaling(O) usb_f_rndis u_ether libcomposite algif_hash algif_skcipher af_alg bnep snd_soc_tegra_controls(O) snd_soc_tegra_utils(O) snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra186_arad(O) snd_soc_tegra210_mvc snd_soc_tegra210_mixer snd_soc_tegra210_ope snd_soc_tegra186_asrc snd_soc_tegra210_sfc snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_i2s rtk_btusb(O) btusb btrtl btintel btmtk btbcm bluetooth ecdh_generic ecc snd_soc_tegra210_ahub rtl8852ce(O) spidev nvhost_nvcsi(O) tegra210_adma nvhost_vi5(O) nvhost_capture(O) tegra_se(O) tegra_se_kds(O) nvhost_pva(O) ina238 crypto_engine ina3221 r8126(O) snd_hda_codec_hdmi tegra_cactmon_mc_all(O) tegra23x_psc(O) tegra_aconnect snd_hda_tegra at24 snd_hda_codec snd_hda_core pwm_tegra_tachometer(O) snd_soc_rt5640 mttcan(O) snd_soc_rl6231 host1x_fence(O) can_dev spi_tegra114 tegra_capture_coe(O) cfg80211 rfkill nvvrs_pseq_rtc(O) lm90
[ 13.346217] nvidia_vrs_pseq(O) crct10dif_ce nvethernet(O) sm3_ce sm3 nvpps(O) coresight_trbe sha3_ce nvmap(O) snd_soc_tegra_audio_graph_card sha512_ce coresight sha512_arm64 nvsciipc(O) snd_soc_audio_graph_card ivc_cdev(O) snd_soc_simple_card_utils nvidia_cspmu tegra234_oc_event(O) nvpmodel_clk_cap(O) arm_spe_pmu ramoops tegra_dce(O) reed_solomon thermal_trip_event(O) arm_cspmu_module tpm_ftpm_tee camera_diagnostics(O) nvhost_isp5(O) tegra_capture_isp(O) tegra_camera(O) v4l2_dv_timings host1x_nvhost(O) tegra_drm(O) tegra_wmark(O) nvhwpm(O) drm_display_helper drm_dp_aux_bus cec drm_kms_helper host1x(O) tegra_camera_platform(O) mc_utils(O) capture_ivc(O) v4l2_fwnode v4l2_async videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc camchar(O) rtcpu_debug(O) tegra_camera_rtcpu(O) ivc_bus(O) hsp_mailbox_client(O) nvme_fabrics fuse drm nfnetlink ip_tables x_tables ipv6 pwm_fan pwm_tegra tegra_bpmp_thermal tegra_xudc uas ucsi_ccg typec_ucsi typec nvme nvme_core phy_tegra194_p2u pcie_tegra194
[ 13.346270] ufs_tegra(O) pcie_tegra264(O)
[ 13.346273] CPU: 12 PID: 938 Comm: modprobe Tainted: G O 6.8.12-tegra #1
[ 13.346276] Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS 38.2.0-gcid-41844464 08/22/2025
[ 13.346277] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 13.346279] pc : __reset_control_get_internal+0x68/0x16c
[ 13.346283] lr : __of_reset_control_get+0x190/0x1d8
[ 13.346286] sp : ffff80008f6335c0
[ 13.346287] x29: ffff80008f6335c0 x28: 0000000000000004 x27: ffff0000933d8110
[ 13.346290] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000001
[ 13.346293] x23: 0000000000000000 x22: ffff0000888c8d50 x21: 000000000000001f
[ 13.346295] x20: ffff0000888c8d70 x19: ffff0000935c2600 x18: 00000000fffffffe
[ 13.346298] x17: ffff80008f632db8 x16: ffffbe918510e800 x15: ffff80008f633300
[ 13.346300] x14: ffffffffffffffff x13: 0000000000000018 x12: 0101010101010101
[ 13.346303] x11: 7f7f7f7f7f7f7f7f x10: ffffc291887de399 x9 : 0000000000000018
[ 13.346306] x8 : 0101010101010101 x7 : 00000000736c6c65 x6 : 00000080a3f0ef77
[ 13.346308] x5 : ffff80008f633618 x4 : fffffbfffde272dc x3 : 0000000000000001
[ 13.346310] x2 : 000000000000001f x1 : 000000000000001f x0 : 0000000000000000
[ 13.346313] Call trace:
[ 13.346315] __reset_control_get_internal+0x68/0x16c
[ 13.346318] __of_reset_control_get+0x190/0x1d8
[ 13.346321] __reset_control_get+0x48/0x1cc
[ 13.346324] __devm_reset_control_get+0x78/0xd4
[ 13.346327] nvlink_core_init+0x3a0af0/0x3a0d58 [nvidia]
[ 13.346511] platform_probe+0x68/0xe0
[ 13.346516] really_probe+0x150/0x2c8
[ 13.346519] __driver_probe_device+0x78/0x134
[ 13.346522] driver_probe_device+0x3c/0x164
[ 13.346524] __driver_attach+0x98/0x1c4
[ 13.346527] bus_for_each_dev+0x7c/0xf4
[ 13.346530] driver_attach+0x24/0x38
[ 13.346532] bus_add_driver+0xec/0x218
[ 13.346534] driver_register+0x5c/0x13c
[ 13.346536] __platform_driver_register+0x28/0x3c
[ 13.346539] nv_platform_register_driver+0x34/0x40 [nvidia]
[ 13.346703] init_module+0x174/0x5a0 [nvidia]
[ 13.346848] do_one_initcall+0x58/0x318
[ 13.346852] do_init_module+0x58/0x1ec
[ 13.346856] load_module+0x1f04/0x2000
[ 13.346859] init_module_from_file+0x88/0xd4
[ 13.346862] __arm64_sys_finit_module+0x148/0x330
[ 13.346864] invoke_syscall+0x48/0x134
[ 13.346868] el0_svc_common.constprop.0+0x40/0xf0
[ 13.346871] do_el0_svc+0x1c/0x30
[ 13.346874] el0_svc+0x30/0xb8
[ 13.346878] el0t_64_sync_handler+0x130/0x13c
[ 13.346880] el0t_64_sync+0x194/0x198
[ 13.346882] ---[ end trace 0000000000000000 ]---
Something closer to an exact failure point:
[ 78.948508] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
Someone from NVIDIA can likely go straight to that code and get an idea of what is failing. Still, it would be useful to know, have you done any apt
type update? Has anything at all been customized yet?
Hi,
Please report your issues separately.
For issue 1, “NVIDIA driver assertion”, it is a known issue.
But for issue 2, it is not. Please file a separate topic for that wifi driver issue and provide method to reproduce. Thanks.
How to solve the “NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706” issue. I use nvidia/pytorch:25.08-py3 docker image, and run the gemma3 27B by HF transformers, the generated contents were in an uncertain state, sometimes generating garbage texts.
Update: I discovered a strange temporary approach to solve this issue:
- Run ‘nvidia-smi dmon’ in the terminal
- Then use the transformers python to inference the gemma27B, every time the generation was correct, and no garbage texts were output. Also, the ‘nvAssertFailed: Assertion failed’ error has gone.
1 Like
Your hack worked for me too. The freezing is gone. To be fair I used
watch -n 1 nvidia-smi
but it’s to the same effect. I compiled llama.cpp in Nvidia’s cuda container and use it without serious trouble.
1 Like
same problem, and I find switch to wayland desktop solved also, another problem is the memory can’t be release when the ollama model is close, and the process can’t find any process, just memory still in there, I think the driver have many problem