I’ve been working with the DGX Spark Grace Blackwell GB10 and ran into a significant issue: standard NVML queries fail because GB10 uses unified memory architecture (128GB shared CPU+GPU) rather than discrete GPU with dedicated framebuffer.
Impact:
- MAX Engine can’t detect GPU:
No supported "gpu" device available - PyTorch/TensorFlow GPU monitoring fails
- pynvml library returns
NVML_ERROR_NOT_SUPPORTED - nvidia-smi shows:
Driver/library version mismatch - DGX Dashboard telemetry broken
This affects any tool expecting standard NVML on unified memory systems.
Community Solution
I’ve developed an open-source NVML library replacement that solves this:
GitHub Repository: GitHub - CINOAdam/nvml-unified-shim: NVML unified memory shim for NVIDIA DGX Spark Grace Blackwell GB10 - enables MAX Engine, PyTorch, and GPU monitoring
Implementation:
- Drop-in replacement for
libnvidia-ml.so.1 - Uses CUDA Runtime API + /proc/meminfo for unified memory queries
- 16 core NVML functions implemented
- Works with Python ctypes, C/C++ applications
What’s Working:
✅ MAX Engine GPU detection and inference
✅ PyTorch/TensorFlow GPU monitoring
✅ pynvml library
✅ nvidia-smi wrapper
✅ DGX Dashboard telemetry
Installation: CAUTION please only use this if you know what you are doing :-)
git clone https://github.com/CINOAdam/nvml-unified-shim.git
cd nvml-unified-shim
make -f Makefile.python
sudo make -f Makefile.python install
Verification:
python3 -c "from max.driver import Accelerator; print(Accelerator())"
# Output: Device(type=gpu,id=0) ✅
Questions for NVIDIA
This is a working solution for the community, but I’d love guidance from the NVIDIA team:
-
Official Support: Is NVIDIA planning native NVML support for unified memory architectures (GB10, GH200, GB200)?
-
Recommended Approach: Is using CUDA Runtime + /proc/meminfo the right long-term approach, or is there a better API?
-
Semantics: How should GPU utilization be reported on unified memory? (Currently returning 0% since traditional metrics don’t apply)
-
Collaboration: Would NVIDIA be interested in collaborating on official support or reviewing this implementation?
Technical Details: nvml-unified-shim/NVIDIA_COLLABORATION.md at main · CINOAdam/nvml-unified-shim · GitHub
Hardware Tested
- System: NVIDIA DGX Spark (Grace Blackwell GB10)
- Memory: 128GB LPDDR5x unified
- CUDA: 12.8 / 13.0
- OS: Ubuntu 24.04 LTS
- Software: MAX Engine 26.2.0, PyTorch 2.x, TensorFlow 2.x
Should work on other Grace Blackwell systems (GH200, GB200).