Why 273 GB/s? Less Is More, Until It Isn’t

flash3 · February 11, 2026, 1:28pm

Its about loss of information. FACT: you loose something that the llm was glad to have been learned from training: nuances. there is another thread concerning FP4, especially NVFP4. NVIDIA has a new star on the firmament: PPXZPXUT? ( after QAD, QAT and other quant aware techniques. And they ended up in retraining.

Others train llms in INT4 initially (moonshot). So 4 bit is the way, for some applications good enough, but it shows degration if is derived from larger words (16 bit)! And to give you an example of the costs: If you do a research of solution competence vs amount of weights a model should have, you will notice that weights doubled means only competence +2%. So… if you loose 10% or more by quant to 4 bit, it hurts, mentally of course.

I think this is piloting, this is a huge portion of hope and this is application driven. NVIDIA should optimize GLM 4.7 to keep BF16 accuracy in NVFP4. Mission (proof the) impossible! I’d buy another two sparkies if it works.

Topic		Replies	Views
DGX Spark performance DGX Spark / GB10	50	4994	February 27, 2026
6x Spark setup DGX Spark / GB10	112	9282	April 25, 2026
FP4 on DGX Spark — Why It Doesn't Scale Like You'd Expect DGX Spark / GB10	214	5851	March 27, 2026
How to run GLM 4.7 on dual DGX Sparks with vLLM / mods support in spark-vllm-docker DGX Spark / GB10	28	4098	January 2, 2026
Two multi-node DGX Spark wins: RoCE 2× inference throughput + Qwen3.5-397B-A17B-NVFP4 serving (with SM121 CUTLASS patch) DGX Spark / GB10 Projects	4	675	April 16, 2026
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1825	February 13, 2026
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	27	2680	March 26, 2026
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	145	7776	March 28, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	4314	March 6, 2026
Qwen3.5-397B-A17B-int4-AutoRound - 4 x db10 node - updated results 37 - 94 tok/s DGX Spark / GB10 clustering , spark	26	1803	April 28, 2026

Why 273 GB/s? Less Is More, Until It Isn’t

Related topics