From the bug report data this appears to be a driver allocation failure occurring under unified memory pressure, which would explain why the system becomes unresponsive instead of returning a normal CUDA out-of-memory error.
The kernel log repeatedly shows failures in the driver’s internal allocation path:
NV_ERR_NO_MEMORY
_memdescAllocInternal @ mem_desc.c:1359
This function allocates internal driver memory descriptors used for GPU objects. Because this allocation occurs below the CUDA runtime layer, the application may not receive a normal cudaErrorMemoryAllocation when this path fails.
Once those descriptor allocations fail, the log shows the GPU context allocation path failing as well — kgrctxAllocMainCtxBuffer at kernel_graphics_context.c:1387, cascading to kgrctxAllocCtxBuffers at kernel_graphics_object.c:214. After that point the nvidia-modeset kernel thread enters uninterruptible sleep (D-state) for more than 122 seconds waiting for a resource that cannot be satisfied. That blocked display thread matches the freeze described in the thread.
In simplified form:
system memory pressure
→ driver descriptor allocation fails (NV_ERR_NO_MEMORY)
→ GPU context creation fails
→ nvidia-modeset blocks in D-state (122+ seconds)
→ system becomes unresponsive
This pattern appears multiple times across separate boot cycles in the report.
Memory state
The memory snapshot in the report shows an interesting pattern:
MemTotal: ~125 GB
MemFree: ~1 GB
MemAvailable: ~103 GB
Cached: ~98 GB (102,386,816 kB)
Inactive(file): ~94 GB
Slab: ~6.4 GB
Only about 1 GB is truly free, while nearly 100 GB is held in file cache.
Linux counts the page cache as reclaimable and therefore reports large MemAvailable, but reclaim still needs to occur before new allocations succeed. If reclaim latency becomes high during a burst of allocations, a driver allocation path may fail even though MemAvailable appears large.
Unified memory architecture on GB10
One important difference on DGX Spark is that the GPU does not have a dedicated framebuffer. GPU allocations come from the same system memory pool used by the CPU.
On traditional discrete GPU systems memory allocation typically looks like this:
CPU processes → system RAM
GPU kernels → VRAM
page cache → system RAM
GPU memory pressure and Linux system memory pressure are largely independent.
NVIDIA describes this model in the CUDA Unified Memory documentation: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#unified-memory-introduction
On DGX Spark the Grace CPU and Blackwell GPU access a shared system memory pool through the NVLink-C2C coherent interconnect. The following consumers all draw from the same pool:
CPU processes
filesystem page cache
kernel slab allocations
GPU allocations (driver descriptors, contexts, user buffers)
Under heavy workloads these components can compete for the same physical memory resources.
NVIDIA documents the DGX Spark hardware architecture here: https://docs.nvidia.com/dgx/dgx-spark/hardware.html
Monitoring limitations
Traditional GPU monitoring tools do not expose unified memory usage on this platform. The report shows:
TotalDedicatedGPUMemory → Operation not supported
UsedDedicatedGPUMemory → Operation not supported
FB Memory Total/Used/Free → N/A
This is expected on UMA platforms where the GPU does not expose a discrete framebuffer.
The PCIe link information reported by nvidia-smi (Gen1 x1) is also expected on GB10 systems, since the GPU communicates with the Grace CPU through the NVLink-C2C interconnect rather than a conventional PCIe link.
System configuration factors
A few aspects of the environment may be relevant when investigating memory pressure on this platform.
Swap
The report indicates swap was enabled earlier and later disabled. On systems where the GPU shares system memory, many users prefer to keep swap disabled to simplify reclaim behavior. The report already shows swap disabled, so that step appears to have been taken.
cat /proc/swaps
grep swap /etc/fstab
Docker container limits
The thread also mentions that the crash occurs even when the workload runs inside a Docker container with a memory limit such as --memory=100g. Docker memory limits apply to container processes through Linux cgroups: https://docs.docker.com/engine/containers/resource_constraints/
Since the failure still occurs under those conditions, the memory pressure involved here may not be fully contained by the container limit.
Page cache reclaim
Some users report that freeing page cache before launching very large workloads can reduce memory pressure:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
This simply forces cached pages back into the free pool and should be considered a temporary workaround, not a long-term solution.
Related discussion: https://forums.developer.nvidia.com/t/how-to-automatically-free-shared-system-memory/363178
Observability
Diagnosing unified-memory pressure can be difficult because standard tools cannot directly show:
-
GPU residency in shared system memory
-
unified memory migration pressure
-
driver allocation pressure
To explore those aspects I have been experimenting with unified-memory diagnostics aimed at making that behavior more visible:
https://github.com/parallelArchitect/cuda-unified-memory-analyzer
The goal is simply to provide additional visibility into unified memory pressure so issues like this can be investigated earlier.
Summary
The failure pattern in the report does not look like a typical CUDA runtime OOM. Instead it appears that a driver-level allocation path fails while the system is under unified-memory pressure, after which the display stack becomes blocked waiting on that driver operation.
Given the architecture involved (shared system memory, NVLink-C2C interconnect, ATS addressing), further investigation may be needed to determine how memory reclaim interacts with the driver allocation path on this platform.
If additional diagnostics or traces would help narrow this down further, I would be interested to see them.