Qwen3.5-35B-A3B optimizations on single Spark

An update on the qwen35-35b-fp8-mtp vs qwen35-122b-hybrid-int4fp8 question

I have been alternating between both doing regular project work with OpenCode.

Tool Call Failures

The first noticeable problem I was encountering was with tool calling. So I changed:

  --chat-template /models/qwen3.5-enhanced.jinja \
  --tool-call-parser qwen3_coder \

After downloading qwen3.5-enhanced.jinja and placing it in ~/models/.

This seems to improve tool call reliability on both models when the context fills beyond 100k

Drift

Overall I think 35b FP8 is amazing for the size. Particularly for lots of short, routine coding jobs – its excellent. But the 122b model despite the hybrid quantisation, handles greater complexity and longer running tasks more intelligently. I don’t have a benchmark to point to, just a day working between both models and observing what happened. As context rot sets in at about 130k the 35b FP8 model gets a lot dumber a lot quicker in my opinion. It forgets and ignores instruction, looks for answers in the wrong places, and has inferior judgment about the causes of bugs, despite intervention and gets locked into patterns of thinking it can’t break free from.

3 Likes