Hello NVIDIA Engineers!
I’m profiling LLM inference workloads on a DGX Spark (GB10, SM121) using Nsight Systems 2025.x. The profiling works well overall, but I noticed that Unified Memory tracing is not supported on this platform:
CUDA device 0: Unified Memory trace is not supported by the current driver version or configuration.
Environment:
- Hardware: NVIDIA GB10 (DGX Spark)
- GPU Architecture: SM121 (Blackwell)
- Driver: 580.95.05
- CUDA: 13.1
- Nsight Systems: 2026.1.1.204-261137176666v0 OSX.
Use Case:
My LLM inference workload on GB10 heavily uses Unified Memory for CPU-GPU data movement. Without UM tracing support, I’m unable to profile a critical aspect of my application’s performance:
1. Page fault analysis - Understanding when and where page faults occur during inference
2. Memory migration patterns - Identifying bottlenecks in data movement between CPU and GPU
3. Prefetch effectiveness - Validating whether prefetching strategies are working as intended
This is a significant gap in profiling capability for workloads that rely on Unified Memory on the GB10 platform.
Questions:
1. Is UM tracing support planned for SM121/GB10 in a future driver or Nsight Systems release?
2. Is this a hardware limitation of the GB10, or a driver/software limitation that could be addressed?
3. Are there alternative profiling approaches you’d recommend for understanding memory access patterns on GB10?
Workaround Attempts:
I’ve tried enabling the options explicitly via `–cuda-um-cpu-page-faults=true` and `–cuda-um-gpu-page-faults=true` on `nsys launch`, but the same limitation message appears.
Thank you for your continued work on Nsight Systems - it’s an invaluable tool for CUDA performance optimization. Any guidance on UM tracing support for Blackwell desktop GPUs would be appreciated.