Hello,
Thank you for the detailed feedback and for articulating your concerns so clearly. We understand the expectations that come with deploying NVIDIA systems such as DGX Spark, and we appreciate the opportunity to clarify both the current state and the roadmap.
Below we address your points and questions directly, with additional context.
Clarifications on the Reported Issues
1. PyTorch (CUDA 13.0, ARM64 wheels)
PyTorch wheels are distributed via a custom PyTorch index URL by design. This is not specific to SM121 or GB10. PyPI does not support publishing multiple CUDA variants of the same package, which affects all major frameworks (PyTorch, vLLM, SGLang, etc.).
This does not indicate a lack of compatibility. On the contrary, best practices strongly recommend using the official framework indexes to ensure you receive fully validated, CUDA-enabled builds.
Additional context:
-
CUDA kernels are compiled at the major architecture family level (sm12x), not per individual SKU.
-
Only certain Tensor Core–specific kernels require conditional compilation, which is already handled in the codebase.
-
PyTorch 2.10, scheduled for release on January 21, includes FBGEMM and CUTLASS matmul integrations, further improving performance on sm12x platforms.
2. Triton
Triton operates as an independent project, and NVIDIA actively collaborates and contributes upstream.
The issue you reference is a known pattern from prior architectures:
Relevant bugs have been addressed in Triton 3.6.0, which resolves the sm12x handling concerns.
3. FlashInfer
Support for sm12x was added starting in FlashInfer v0.5.2.
Notably, the wheels are now explicitly built targeting sm12x, ensuring compatibility with GB10-class devices.
4. CUTLASS
CUTLASS fully supports sm12x today.
Additional optimizations, including new MMA functions, are landing in the CUTLASS v4.4.x series to further enhance performance on Blackwell-class GPUs.
5. MoE Kernels
This is an area of active development. Optimized configurations for GB10 are being worked on and will be introduced incrementally in upcoming releases.
6. vLLM
NGC container versioning will be made more explicit in upcoming releases and on the Build & Spark documentation pages.
In the meantime:
7. SGLang
SGLang runs correctly on DGX Spark today using its official custom wheel distribution:
Responses to Your Direct Questions
1. When will SM121 receive native support instead of SM80 fallbacks?
sm80-class kernels can execute on DGX Spark because Tensor Core behavior is very similar, particularly for GEMM/MMAs (closer to the GeForce Ampere-style MMA model). DGX Spark not has tcgen05 like jetson Thor or GB200, due die space with RT Cores and DLSS algorithm
This is a compatibility feature, not a permanent fallback. Native sm12x-optimized kernels are being introduced progressively across libraries.
Example reference:
https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/ampere/flash_attention_v2.py
2. What is the official roadmap for GB10 software parity with SM120 (RTX 50xx)?
GB10 already has software parity with RTX 50xx.
Both platforms belong to the same sm12x architecture family, and the software stack is aligned at that level.
3. Who at NVIDIA owns DGX Spark software readiness?
DGX Spark software readiness is a cross-functional responsibility.
Multiple NVIDIA teams—CUDA, frameworks, libraries, NGC, and systems—work together to deliver and validate the end-to-end experience.
Closing
We recognize that deploying DGX Spark at scale requires not only hardware capability but also a mature and transparent software ecosystem. Your feedback is valuable and is actively influencing prioritization across teams.
We remain committed to delivering enterprise-grade software support that matches the expectations of enterprise customers.