ASUS Ascent GX10 (GB10) hard shutdown under heavy vLLM load | of_root node is NULL and EM: CPUs must have same capacity dmesg errors

parallelArchitect · May 14, 2026, 5:20am

Looking at your journalctl excerpt, three signals stand out:

1. CDI Device Injection Failure (20:05:33)

dockerd: CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all

Your container launched without proper GPU device mapping. Docker couldn’t resolve the NVIDIA GPU device interface. This happens at startup, ~7 minutes before the crash.

2. Memory Allocation Failure (20:12:10–20:12:11)

kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051)
returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359

Two consecutive failures. The driver could not allocate from the descriptor pool. On GB10, CPU and GPU share one LPDDR5X pool—no separate boundary. A driver-level allocation failure means the pool is exhausted.

3. Cable Removal (20:02:52)

cx7-pcie-hotplug MTKP0001:00: Cable removal

This signal does not appear related to the shutdown sequence.

The failure chain: CDI device injection fails at container startup → driver exhausts LPDDR5X descriptor pool 7 minutes later under vLLM load → two-second cascade of allocation failures → shutdown.

This pattern matches a broader failure class on GB10 when unified memory is exhausted. See: System crashes when memory is full

Topic		Replies	Views
Title: ASUS Ascent GX10 (GB10) hard power-off / unclean reboot under vLLM (gpt-oss-120b, long context) DGX Spark / GB10 Projects ota	33	2856	April 26, 2026
Another Asus GX10 Problem DGX Spark / GB10	22	2141	June 29, 2026
DGX Spark GB10 – Asus GX10 – GPU becomes inoperable DGX Spark / GB10 gpu , nvidia-smi , dgx , dgx-spark-issue	6	442	June 7, 2026
DGX Spark (GB10) reproducibly hard powers-off under GPU load — fully updated, zero crash capture DGX Spark / GB10 boot , kernel , ota , dgx-spark-issue	13	390	June 14, 2026
MSI EdgeXpert Suddenly Power-Off During llama-benchy – Possible PD Firmware Issue? DGX Spark / GB10 llama	25	838	March 9, 2026
GB10 is power limited after crash DGX Spark / GB10 cuda	11	967	June 30, 2026
My DGX System is getting shut itself down while running my LLM Fine tuning project . RAM Reaches to 100 percent along with GPU reaches 100 percent DGX Spark / GB10	10	865	March 31, 2026
DGX Spark Performance Degradation - GPU Power Draw Issue DGX Spark / GB10 power , performance , llama , dgx-spark-issue	69	4351	June 15, 2026
Anyone have hard crashes on The DGX? DGX Spark / GB10 cuda , llama , dgx	8	475	January 16, 2026
DGX Spark Shutdown around 95°C during nanoChat Pretraining (20-30 min) DGX Spark / GB10	21	1784	March 23, 2026

ASUS Ascent GX10 (GB10) hard shutdown under heavy vLLM load | of_root node is NULL and EM: CPUs must have same capacity dmesg errors

Related topics