Qwen3.6-27B is out ! i hope for 122B now [image] Qwen/Qwen3.6-27B · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

[image] bernisse: I have been trying this for several hours and just can’t replicate this performance. Identical recipe and models but its about the same as when I run the full FP8 without dflash. I will keep messing with this. Have you tried to run with the launch command I posted earlier?

Yes, I did. I get very similar performance as when running the recipe. Better accuracy which I suspect is due to the chat template change but slightly higher median turn time. My tok/s falls off drastically as the context gets larger. I just don’t see the advantage of dflash right now, my acceptance…

Yes, I’m seeing the same behavior: model test t/s peak t/s ttfr (ms) est_ppt (ms) e2e_ttft (ms) unsloth/Qwen3.6-27B-NVFP4 pp2048 2063.96 ± 48.61 1033.66 ± 22.64 993.29 ± 22.64 1033.66 ± 22.64 unsloth/Qwen3.6-27B-NVFP4 tg32 36.65 ± 9.84 38.17 ± 9.82 unsloth/Qwen3.6-27B-NVFP4 pp2048 @…

It holds up better with the open Sliding Attention Window PR (Qwen3.6-27B DFlash drafter was trained with sliding attention windows) - building spark-vllm-docker with PR #40898 plus adding FlashQLA as a mod. This is using the Unsloth NVFP4 quant (using tg128 as I find that more representative): …

Definitely, every DFLash option I tried in my custom-code analyzing benchmark, failed, stalled or was REAALY SLOW mid-through the process. MTP versions are working well for me. Also, I can’t use fp8 kv-cache-dtype with current DFlash attention implementation (not sure if a fix for that in vLLM is o…

Yeah, I’m not sure how to add those two things into the docker build… /build-and-run.sh --tf5 –??? maybe adding a PR?

here is what I did: Applying FlashQLA-Blackwell PR #3 to spark-vllm-docker Context: Fixes forward_flashqla breakage caused by upstream vLLM drift. PR adds a v2 patch tested against vLLM commit 8f89381. Steps Pull the PR branch from FlashQLA-Blackwell PR #3 : cd FlashQLA-Blackwell git fetch origi…

To get the specific vLLM PR you must rebuild the container with something like this (I tag mine differently when building with a PR) ./build_and_copy.sh --apply-vllm-pr 40898 --tf5 -t vllm-node-dflash The above post is then correct for the mod.

Thanks! I’ll try in a couple of hours :)

Qwen3.6-27B is out!

Accelerated Computing DGX Spark / GB10 User Forum DGX Spark / GB10

joshua.dale.warner May 12, 2026, 8:43pm 92

Try building as I suggested above. Custom build of spark-vllm-docker with the sliding window PR plus FlashQLA mod on board.

I was lukewarm on DFlash (still feel it won’t leap ahead until DDTree arrives), but on my real workloads for challenging text DFlash with this build now matches MTP=3 at context out to 30k+ for the Unsloth NVFP4 and PrismaQuant versions.

Topic		Replies	Views
Qwen/Qwen3.6-35B-A3B (and FP8) has landed DGX Spark / GB10 agentic-ai	239	20183	May 11, 2026
Qwen3.5 27B optimisation thread starting at 30+ t/s TP=1 DGX Spark / GB10 llama , agentic-ai	23	2348	May 11, 2026
Qwen3.6-27B-Dflash link DGX Spark / GB10 Projects	22	3335	April 29, 2026
DFlash LLM for DGX Spark - too good to be true? DGX Spark / GB10	37	2851	April 17, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	9454	March 24, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	15847	March 24, 2026
Bfloat16 Quality = Speed? DGX Spark / GB10	95	4100	May 17, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	402	16012	May 18, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	10183	April 9, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	5436	March 16, 2026

Qwen3.6-27B is out!

Related topics