My recap what CUDA 13.2 brings that matters for DB10:
The Big Wins for DGX Spark / SM121
cuBLASLt: NVFP4 and MXFP8 performance improvements on DGX Spark. This is the headline item — cuBLASLt now delivers up to 3× performance improvement for NVFP4 and MXFP8 data types on DGX Spark systems for large M and N problem sizes. Also, cuBLASLt’s experimental Grouped GEMM API now supports MXFP8 inputs on GPUs with Compute Capability 10.x and 11.0. NVIDIA
Critical bug fix: A cublasLtMatmul issue that could lead to incorrect results when running concurrently with another kernel that uses Tensor Memory has been fixed NVIDIA — this affected Compute Capability 10.x and 11.x since cuBLAS 12.8. Could be related to quality degradation people were seeing.
CUDA Tile — Now on SM120/SM121
CUDA Tile is now supported on compute capability 8.X (Ampere and Ada), as well as 10.X and 12.X architectures (Blackwell). NVIDIA Developer This is the new tile-based programming model NVIDIA introduced in 13.0. cuTile Python (the Python DSL) now supports recursive functions, closures, custom reductions, and enhanced array slicing. This could eventually become the cleaner path to writing optimized NVFP4 kernels for SM121 vs the current CUTLASS patch-and-pray approach.
Unified Tegra + Desktop Toolkit
CUDA 13.2 delivers a single unified toolkit for Tegra and desktop GPUs, reducing overhead for containers and libraries. NVIDIA This is relevant for DGX Spark since GB10 is an aarch64 Tegra-derived SoC — fewer divergences between the Tegra and desktop CUDA paths means less chance of hitting SM121-specific bugs that only appear on the Spark.
Other Notable Items
PTX ISA 9.2 — new PTX features, worth checking if there are any SM121-specific instruction improvements.
Compiler: support for new host compilers including VS 2026, plus improved nvcc host compilation support on aarch64 systems, including fixes for ARM Neon intrinsics when using newer GCC versions.
CUDA_DISABLE_PERF_BOOST env var added — lets you disable GPU power state boosting, useful for power management in your rack enclosure project.
Is it worth upgrading to CUDA 13.2?
What do you guys think?