And yet another candidate to be tested. Still struggling to get stable performance in terms of tool calling with Qwen3.5 (may be I just missed a fix) and/or Gemma4… and they pushed out already the next. 😅 [image] Qwen/Qwen3.6-35B-A3B-FP8 · Hugging Face We’re on a journey to advance and …

Looks like standard speed for this model without dflash. Try “max-num-seqs: 1” instead of 8

Interesting that it’s slow for you. What’s your output for: tool-eval-bench --spec-bench –-base-url http://xxxx:xxx I’m getting between 40 and 100t/s depending the workload. [image]

Is that results with the same configuration? context size is 256k and kv-cache-dtype is NOT fp8?

It’s weird, it’s not my feeling on VSCode Copilot, that’s a clue, maybe because of temperature who is 1.0 by default, with 0,6 there is more drafted tokens… [Capture d’écran 2026-04-29 à 20.59.45]

I didn’t feel any difference but I’ve posted a screenshot, apparently it’s fast…

Thank you I’ll test that :) Can you share this output ? tool-eval-bench --spec-bench --base-url "http://127.0.0.1:8000" --api-key "xxxxx" --temperature 0.6 --top-p 0.95 --top-k 20 --context-pressure 0.8 --depth "0 4096 8192 16384 32768 65536" For now I’ve that, at 65k it’s impossible to continue …

Ho it’s almost the same than me without any tweak, I’ve tried to apply pr 40898 but it seems to be worse than before… I’m going to try vllm tune :) [Capture d’écran 2026-04-29 à 22.02.42]

Perhaps you made vllm-node during the build, but in the recipe it’s vllm-node-tf5?

Qwen/Qwen3.6-35B-A3B (and FP8) has landed

Accelerated Computing DGX Spark / GB10 User Forum DGX Spark / GB10

azampatti April 29, 2026, 7:18pm 194

you’re in the same ballpark, the only difference might be the VLLM version and the vLLM tune:

Introducing vLLM-Tune — Kernel tuning CLI for vLLM on DGX Spark

And also with:./build-and-copy.sh -t vllm-node --apply-vllm-pr 40898

from Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) - #364 by p1140

Those two things improved my speed ever so slightly. My recipe is the exact same I pasted it here

Topic		Replies	Views
Qwen3.6-27B is out! DGX Spark / GB10 agentic-ai	228	17488	May 20, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	404	16466	May 20, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	15912	March 24, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	5478	March 16, 2026
Qwen3.5 27B optimisation thread starting at 30+ t/s TP=1 DGX Spark / GB10 llama , agentic-ai	23	2398	May 11, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	10244	April 9, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	9486	March 24, 2026
Bfloat16 Quality = Speed? DGX Spark / GB10	95	4178	May 17, 2026
Qwen3.5-35B-A3B optimizations on single Spark DGX Spark / GB10 Projects	46	2506	May 4, 2026
Qwen3.6-27B-Dflash link DGX Spark / GB10 Projects	22	3402	April 29, 2026

Qwen/Qwen3.6-35B-A3B (and FP8) has landed

Related topics