Qwen3.5-35B-A3B optimizations on single Spark

I just tried to switch lanes from vLLM — been running llama.cpp + spiritbuun’s DFlash fork on 35B-A3B and getting some wild numbers.

The key was tuning --spec-draft-p-min 0.3 (kills low-confidence drafts early) and --spec-draft-n-max 14. Block-diffusion + MoE is a
great combo — only ~3B params activate per verify so cycles are fast.

35B-A3B results (after quality fix):

  • HTML/JS coding (~600 tok): 92-101 tok/s
  • HTML/JS sustained (~2000 tok): 85-92 tok/s
  • Short chat: ~44 tok/s (DFlash actually hurts here — verify overhead > gain, stock 60-66 tok/s is better for chat)

27B dense: 38-40 tok/s coding, 23-25 tok/s chat — works great.

Qwen27B works well too.

Loads in seconds, 256K context. FYI: Full notes + launch command:
👉 GitHub - phuongncn/qwen3.6-27b-speedhack-gx10-dgx-spark: Qwen3.6 27B × DFlash — 30-35 tok/s on NVIDIA DGX Spark (GB10) - LLama.Cpp · GitHub

I’m pretty much at the same speeds with this model + DFlash + vLLM tune

I might try this one as well for an A-B comparison

Yep also around similar speeds with Dflash on Qwen 3.6 27B with sliding window attention PR and vLLM 19.2 - man I hope so badly that they release a 122B 3.6 Model. It would be the perfect mix of quality and speed. Right now I feel the 27B compromises on speed and the 35B compromises on Quality and the 3.5 122B is worse in Quality to both according to nearly all benchmarks.

Best sauce for single spark AEON-7/Qwen3.6-35B-A3B-heretic-NVFP4 · Hugging Face

Yes, the fast loading time is a great advantage. I’ve set up my DGX as a local AI server for our online team, which helps save a significant amount of power. I created a script that automatically starts the system when triggered, making the AI ready in just 5 to 10 seconds. Then, if it remains idle for [X] minutes, it automatically shuts down.

I’ve tried it, but couldn’t reach the speed as the author claim, and also not stable.

I found that the 122B model was the only one that passed the full tool-eval-bench with 100% - the 35B performed a bit worse. Fast too!