So I’ve been having running Qwen3.5 122b intel autoround on my spark and tool calling has always been a problem even with the unsloth fix. Short tasks is fine but recently I’ve been running Hermes Agent and with long tasks, tool calling silently fails.
I stumbled upon a chat template fix on reddit that could very fix this. I’m still testing now but it seems promising:
In addition to the chat template, the author suggested using
I have no idea why this would make a change, but I’ve been using it that way for about a week, and no issues. I’ve even had session without a mistake run for multiple hours.
Ok I’m reporting back after a full 12 hours of testing. I was running hermes-agent with llm-wiki skill and had the agent populate my wiki, doing research non-stop for 4 - 6 hours per session.
Using the old --tool-call-parser qwen3_coder with the new chat template resulted in a silent tool call failure after 2 hours. Which is still much better than before where tool calls will fail after a handful of turns.
Using --tool-call-parser qwen3_xml along with the new chat template was the real winner. The session lasted 6 hours and agent finished the task.
I will continue testing this and considered this fixed for the time being.
Sorry for bothering but for opencode Qwen3_XML seemed to make problems with tool calling for me, Qwen3_Coder works better there in my Experience, did anybody else had the same experience? Is Qwen3_XML better than Qwen3_Coder?
I am running Albonds Qwen3.5 122B Hybrid Autoround with Qwen3_coder parser and Qwens default offial tempalte. Does the jinja Template make it better and more reliable overall? I am running it like this:
Yes, like the poster above said, you can just make a mod.
Does it really solve the tool calling issues compared to Unsloth chat template I’m currently using in Qwen 3.5 recipes? If I get enough positive feedback, I may just use this template instead.
It does seem to work better. I started running it today, and haven’t had any failed tool-calls yet, with longer running flows in open code. this is with the qwen3_xml tool parser and the enhanced tool calling on a modified version of your qwen3.5-122b-int4-autoround recipe. With the Unsloth one over the openai-compatable api it felt like it broke quite frequently.
I can confirm from my tests that the new template + XML combo makes tool use much more stable. On a 35B model, I was getting tool call failures every single time without fixes. Switching to XML + Unsloth got me to a 50% success rate, but with the new template + XML, all four initial runs were successful. I did hit one failure during further testing, but that’s roughly a 10% failure rate compared to 50% with Unsloth.
It would be great to add this as a separate mode. Instead of replacing Unsloth, we could just add the new template so people have a choice.
The irony is that just as I stabilized tool use for 35B 3.5, version 3.6 dropped. And it looks like 3.6 is as stable out of the box as 3.5 was with all the fixes — I’ve only had one failure in 8 runs so far.
Just want to confirm 3.6 seems fixed the issue and I see consistent tool calls in long agentic sessions, also new model do a lot of parallel tool calls when needed without any errors. So don’t override default model chat template. Moreover, you should use new ‘preserve_thinking’ kwarg, it helps a lot for agentic workflows: prefix caching works and agent avoids repeatable thinking.
I can attest that this has completely fixed my issues of having tool calls leak into the reasoning block. Before it would happen to me fairly regularly then the model would stop as if it were done doing whatever it was doing.