DFlash LLM for DGX Spark - too good to be true?

Same, the 35b-a3b-fp8 is so nimble, as long as you keep the context length short (~70k) it screams, then can fall off a cliff, however this fix has toned down the thought loop problem – I haven’t really experienced one since, the 122b-hybrid-int4fp8 handles larger context (~140k) but is very opinionated at times and requires grill-me sessions at the start to guide its thinking, I still even turn to Qwen Coder Next AutoRound 16kv for debugging when the other two get stuck, and finally the qwen3.5-enhanced.jinja chat template has reduced tool call failures, which makes the whole experence much more confidence inspiring.

At the end of the day, I feel like there is a lot of subjectivity. One day I think one model is better, then it gets stuck, reward hacks, chooses it own adventure and another comes to the rescue. Was it fresh context, was the jar lid already loosened? Some days I feel certain, then after a bad experience, I don’t. My AI coding, prompting and workflows are all improving at the same time so its not a fair test – ever. The benchmarks never reflect the real life experience coding with these. I have to use these exclusively because my clients need air-gapped solutions, so no frontier models allowed.

1 Like