DFlash LLM for DGX Spark - too good to be true?

whpthomas · April 15, 2026, 11:25pm

Same, the 35b-a3b-fp8 is so nimble, as long as you keep the context length short (~70k) it screams, then can fall off a cliff, however this fix has toned down the thought loop problem – I haven’t really experienced one since, the 122b-hybrid-int4fp8 handles larger context (~140k) but is very opinionated at times and requires grill-me sessions at the start to guide its thinking, I still even turn to Qwen Coder Next AutoRound 16kv for debugging when the other two get stuck, and finally the qwen3.5-enhanced.jinja chat template has reduced tool call failures, which makes the whole experence much more confidence inspiring.

At the end of the day, I feel like there is a lot of subjectivity. One day I think one model is better, then it gets stuck, reward hacks, chooses it own adventure and another comes to the rescue. Was it fresh context, was the jar lid already loosened? Some days I feel certain, then after a bad experience, I don’t. My AI coding, prompting and workflows are all improving at the same time so its not a fair test – ever. The benchmarks never reflect the real life experience coding with these. I have to use these exclusively because my clients need air-gapped solutions, so no frontier models allowed.

Topic		Replies	Views
Qwen3.5 27B optimisation thread starting at 30+ t/s TP=1 DGX Spark / GB10 llama , agentic-ai	18	1144	April 16, 2026
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	145	6572	March 28, 2026
Running Step-3.5-Flash on Single Spark DGX Spark / GB10 Projects jetson , llama	20	2568	February 9, 2026
From 20 to 35 TPS on Qwen3-Next-NVFP4 w/ FlashInfer 12.1f DGX Spark / GB10	10	1530	January 7, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	8600	March 24, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	4807	March 16, 2026
DFlash: Block Diffusion for Flash Speculative Decoding(Blackwell 6000 Pro) JAX llm , llama-31-8b-instruct , llama	5	234	February 8, 2026
Step-3.5-Flash on Single Spark with 256k context DGX Spark / GB10 Projects llama	2	524	March 3, 2026
PSA: State of FP4/NVFP4 Support for DGX Spark in VLLM DGX Spark / GB10	226	9099	April 18, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3909	March 6, 2026

DFlash LLM for DGX Spark - too good to be true?

Related topics