NVML Support for DGX Spark Grace Blackwell Unified Memory - Community Solution

I’ve been working with the DGX Spark Grace Blackwell GB10 and ran into a significant issue: standard NVML queries fail because GB10 uses unified memory architecture (128GB shared CPU+GPU) rather than discrete GPU with dedicated framebuffer.

Impact:

  • MAX Engine can’t detect GPU: No supported "gpu" device available
  • PyTorch/TensorFlow GPU monitoring fails
  • pynvml library returns NVML_ERROR_NOT_SUPPORTED
  • nvidia-smi shows: Driver/library version mismatch
  • DGX Dashboard telemetry broken

This affects any tool expecting standard NVML on unified memory systems.


Community Solution

I’ve developed an open-source NVML library replacement that solves this:

GitHub Repository: GitHub - CINOAdam/nvml-unified-shim: NVML unified memory shim for NVIDIA DGX Spark Grace Blackwell GB10 - enables MAX Engine, PyTorch, and GPU monitoring

Implementation:

  • Drop-in replacement for libnvidia-ml.so.1
  • Uses CUDA Runtime API + /proc/meminfo for unified memory queries
  • 16 core NVML functions implemented
  • Works with Python ctypes, C/C++ applications

What’s Working:
✅ MAX Engine GPU detection and inference
✅ PyTorch/TensorFlow GPU monitoring
✅ pynvml library
✅ nvidia-smi wrapper
✅ DGX Dashboard telemetry

Installation: CAUTION please only use this if you know what you are doing :-)

git clone https://github.com/CINOAdam/nvml-unified-shim.git
cd nvml-unified-shim
make -f Makefile.python
sudo make -f Makefile.python install

Verification:

python3 -c "from max.driver import Accelerator; print(Accelerator())"
# Output: Device(type=gpu,id=0) ✅

Questions for NVIDIA

This is a working solution for the community, but I’d love guidance from the NVIDIA team:

  1. Official Support: Is NVIDIA planning native NVML support for unified memory architectures (GB10, GH200, GB200)?

  2. Recommended Approach: Is using CUDA Runtime + /proc/meminfo the right long-term approach, or is there a better API?

  3. Semantics: How should GPU utilization be reported on unified memory? (Currently returning 0% since traditional metrics don’t apply)

  4. Collaboration: Would NVIDIA be interested in collaborating on official support or reviewing this implementation?

Technical Details: nvml-unified-shim/NVIDIA_COLLABORATION.md at main · CINOAdam/nvml-unified-shim · GitHub


Hardware Tested

  • System: NVIDIA DGX Spark (Grace Blackwell GB10)
  • Memory: 128GB LPDDR5x unified
  • CUDA: 12.8 / 13.0
  • OS: Ubuntu 24.04 LTS
  • Software: MAX Engine 26.2.0, PyTorch 2.x, TensorFlow 2.x

Should work on other Grace Blackwell systems (GH200, GB200).

1 Like

Looks cool! I will move this over to GB10 projects