Qwen/Qwen3.6-35B-A3B (and FP8) has landed

you’re in the same ballpark, the only difference might be the VLLM version and the vLLM tune:

Introducing vLLM-Tune — Kernel tuning CLI for vLLM on DGX Spark

And also with:./build-and-copy.sh -t vllm-node --apply-vllm-pr 40898

from Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) - #364 by p1140

Those two things improved my speed ever so slightly. My recipe is the exact same I pasted it here