NVFP4 on DGX Spark / GB10 is broken. I bought 9 of these for this feature. Requesting NVIDIA's official roadmap and response

fidecastro · May 11, 2026, 9:30pm

Been out of the forum for a couple of weeks and humbly apologize for not keeping up with it. I tried going through the threads and am still confused about it, so here goes: what’s the current status of NVFP4 support in the Spark? It is at least as fast as FP8 with lower memory footprint?

tenari · May 12, 2026, 1:29pm

It works fine for me. I think there is some disillusionment about it, partially because of how int4 models perform compared to nvfp4 ones, but I think that’s largely explainable by the fact that many “nvfp4” models actually keep huge chunks retained at bf16 and so it’s not truly and apples-to-apples comparison.

At least, that was my initial frustration.

Kernel wise, everything works in vllm. The two key flaws that really screwed-up how sm120/121 operated have been resolved. There was the “illegal instruction” issue caused by lack of support in blackwell consumer for tcgen05 – but that was fixed in CUDA 12.9 and now emulates cleanly (not sure of how performant it is). And then there’s the other issue of the SMEM being smaller on the consumer-grade cards (99K vs 224K), and things not adapting to that limitation. That also seems to be fixed now across TRT-LLM, vllm, cutlass, etc.

There’s also the b12x kernel now, which is first-party from nvidia (well, written by an nvidia engineer) that improves performance, and I’m sure more improvements are on the way. The reality is that the Spark is a nice little machine, and it’s the cheapest way to get this much VRAM in a cuda ecosystem, but the physics of memory bandwidth remain somewhat of a limitation. I’d love to see a NUMA version where maybe the same GPU had access to 16GB of faster VRAM, but tbh the spark is still great and I know for me personally facilitates a ton of things I wouldn’t otherwise be able to do.

pfnguyen · May 12, 2026, 6:24pm

There’s also the d12 kernel now

d12? b12x?

tenari · May 12, 2026, 11:26pm

whoops!

Topic		Replies	Views
I am EXTREMely disappointed with the current state of DGX Spark DGX Spark / GB10	81	9762	May 8, 2026
Dearest CUTLASS TEAM, When the hell are you going to properly fix tcgen05 FP4 support for DGX Spark / GB10 (SM121)? DGX Spark / GB10	37	2065	April 25, 2026
DGX Spark (SM121) Software Support is Severely Lacking - Official Roadmap Needed DGX Spark / GB10	41	4675	February 15, 2026
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	27	2667	March 26, 2026
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	145	7602	March 28, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	3018	December 31, 2025
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	2451	December 25, 2025
FP4 on DGX Spark — Why It Doesn't Scale Like You'd Expect DGX Spark / GB10	214	5756	March 27, 2026
Should we as a community gofundme one Spark for Eugr's nightly builds? DGX Spark / GB10	52	1555	April 15, 2026
PSA: State of FP4/NVFP4 Support for DGX Spark in VLLM DGX Spark / GB10	232	11258	May 3, 2026

NVFP4 on DGX Spark / GB10 is broken. I bought 9 of these for this feature. Requesting NVIDIA's official roadmap and response

Related topics