Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark)

Albond · April 5, 2026, 12:09pm

About the llama-benchy numbers — the difference is real, just measured differently.

Think of it this way: without MTP, the model does 1 decode step = 1 token. With MTP, the model does 1 decode step but produces ~2 tokens (1 regular + 1 speculative, 95% accepted).

llama-benchy measures decode steps per second — how fast the model runs forward passes. That’s ~20 steps/sec, and each step is actually a tiny bit slower now because of the MTP head overhead. So llama-benchy sees no improvement or even a slight slowdown.

bench_qwen35.sh and real chat measure what you actually get — tokens out divided by wall-clock time. 20 steps/sec × ~1.95 accepted tokens per step = ~39 tok/s. That’s the number you feel when using the model.

Both are correct:

~20 tok/s = how fast the engine runs (decode steps)
~38-40 tok/s = how fast you get your answer (effective throughput)

I see the same thing in my daily use — same prompt that used to take 26 seconds now finishes in 17.

Topic		Replies	Views
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	16282	March 24, 2026
Qwen/Qwen3.6-35B-A3B (and FP8) has landed DGX Spark / GB10 agentic-ai	297	23946	June 3, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	10649	April 9, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	5727	March 16, 2026
Qwen3.5-35B-A3B optimizations on single Spark DGX Spark / GB10 Projects	48	3036	May 22, 2026
Qwen3.5-122B-A10B on single Spark: 15 → 21.5 tok/s with hybrid GPTQ-INT4 + FP8 dense layers (https://github.com/rmstxrx/vllm-hybrid-quant) DGX Spark / GB10 cuda	9	749	March 20, 2026
Qwen3.5-397B-A17B run in dual spark! but I have a concern DGX Spark / GB10	235	8729	May 23, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	9780	March 24, 2026
RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark DGX Spark / GB10 Projects llm	75	5956	May 4, 2026
Qwen3.5-397B-A17B + DGX Spark (duo) DGX Spark / GB10 Projects	59	5489	June 2, 2026

Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark)

Related topics