The story so far … a hardware-limited little Spark — compute-bound, network-bound, memory-bound. Already pitched by NVIDIA themselves as “better buy two.” Lots of enthusiasts in the forums. The dead horse “NVFP4” praised as the big hit. Two nodes don’t make anything faster — they’re only good for…

Why 200 tok/s is new normal? — TP=2 Does Scale After All

flash3 March 19, 2026, 3:24pm 21

qwen 3.5 122b int4 autoround after 1h devops and code tree analysis. 10% never used.

Topic		Replies	Views
Qwen3.5-397B-A17B-int4-AutoRound - 4 x db10 node - updated results 37 - 94 tok/s DGX Spark / GB10 clustering , spark	26	1747	April 28, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	5285	March 16, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	399	14517	May 10, 2026
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	2420	December 25, 2025
From 20 to 35 TPS on Qwen3-Next-NVFP4 w/ FlashInfer 12.1f DGX Spark / GB10	10	1613	January 7, 2026
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	5148	December 9, 2025
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	9862	April 9, 2026
Two multi-node DGX Spark wins: RoCE 2× inference throughput + Qwen3.5-397B-A17B-NVFP4 serving (with SM121 CUTLASS patch) DGX Spark / GB10 Projects	4	623	April 16, 2026
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	1385	February 13, 2026
Qwen/Qwen3.6-35B-A3B (and FP8) has landed DGX Spark / GB10 agentic-ai	237	18483	May 10, 2026