How to enable nvfp4

vida_vakil · November 3, 2025, 10:45pm

Hello,

I’m trying to use nvfp4 on DGX Spark. From the errors I get and the issues and PRs in TransformerEngine repo (e.g., #2255 and #2279) I read that I have to build transformer-engine for the right sm_xxxa (NVTE_CUDA_ARCHS=121a or 120a?). But what I have tried has not worked yet.

If/when this feature is supported, could you please share how exactly to build TE with nvfp4 support for DGX Spark and how to properly specify NVFP4BlockScaling() flags? Also, is there any NGC to use?

vida_vakil · November 5, 2025, 7:36am

Please ignore. I meant to post this in DGX Spark user forum.
Trying to delete the post gives me permission errors.

aniculescu · November 5, 2025, 11:38pm

DGX Spark is sm121, not sure if TransformerEngine supports it however. Will investigate

raphael.amorim · November 5, 2025, 11:45pm

check this: Running nvidia/Nemotron-Nano-VL-12B-V2-NVFP4-QAD on your spark . You could use the same dependencies

vida_vakil · November 6, 2025, 12:25am

Thanks for your response and link.
There are other resources too, for nvfp4-quantized inference, such as the following two.
However, my use case is different. I am training a custom model (a recurrent variant of Transformer). The model is bandwidth-limited, and hence its training throughout drops on DGX Spark, in spite of Spark’s 128GB memory (my base line is a Titan RTX workstation). My understanding is that support for the feature (nvfp4) has to come from transformer-engine package, but so far it does not seem to be available for DGX Spark (sm_121) yet.

ScottEllis · November 6, 2025, 6:24pm

We moved it to the right forum, no need to delete @vida_vakil :-)

ScottE

Topic		Replies	Views
Transformer Engine and GB10 - MXFP8 and MXFP4 training not yet supported? DGX Spark / GB10	3	545	November 19, 2025
nvFP4 training - Playbook request DGX Spark / GB10	12	540	March 16, 2026
Marlin Fix: NVFP4 Actually Works on SM121 (DGX Spark) DGX Spark / GB10 Projects jetson , nemotron	15	2484	April 12, 2026
NVFP4 quantization of a 100B-class Llama on 2× DGX Spark — lessons + open questions DGX Spark / GB10 llama	5	387	May 15, 2026
Dearest CUTLASS TEAM, When the hell are you going to properly fix tcgen05 FP4 support for DGX Spark / GB10 (SM121)? DGX Spark / GB10	37	2392	April 25, 2026
NVFP4 quantization on the GP10 error DGX Spark / GB10 llama , deepseek	3	400	November 14, 2025
SM121 (GB10) native NVFP4 compute — seeking guidance on software support DGX Spark / GB10 cuda , kernel , nemotron	3	904	March 25, 2026
DGX Spark, Nemotron3, and NVFP4: Getting to 65+ tps DGX Spark / GB10 spark , nemotron , dgx	14	2216	December 22, 2025
NVFP4 on DGX Spark / GB10 is broken. I bought 9 of these for this feature. Requesting NVIDIA's official roadmap and response DGX Spark / GB10 jetson , llama , agentic-ai , nemotron , nemoclaw	44	5870	May 17, 2026
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	27	2837	March 26, 2026

How to enable nvfp4

Related topics