Qwen3.5-35B-A3B optimizations on single Spark

whpthomas · April 14, 2026, 3:25am

An update on the qwen35-35b-fp8-mtp vs qwen35-122b-hybrid-int4fp8 question

I have been alternating between both doing regular project work with OpenCode.

Tool Call Failures

The first noticeable problem I was encountering was with tool calling. So I changed:

  --chat-template /models/qwen3.5-enhanced.jinja \
  --tool-call-parser qwen3_coder \

After downloading qwen3.5-enhanced.jinja and placing it in ~/models/.

This seems to improve tool call reliability on both models when the context fills beyond 100k

Drift

Overall I think 35b FP8 is amazing for the size. Particularly for lots of short, routine coding jobs – its excellent. But the 122b model despite the hybrid quantisation, handles greater complexity and longer running tasks more intelligently. I don’t have a benchmark to point to, just a day working between both models and observing what happened. As context rot sets in at about 130k the 35b FP8 model gets a lot dumber a lot quicker in my opinion. It forgets and ignores instruction, looks for answers in the wrong places, and has inferior judgment about the causes of bugs, despite intervention and gets locked into patterns of thinking it can’t break free from.

Topic		Replies	Views
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	320	9047	April 19, 2026
Qwen3.5-122B-A10B on single Spark: 15 → 21.5 tok/s with hybrid GPTQ-INT4 + FP8 dense layers (https://github.com/rmstxrx/vllm-hybrid-quant) DGX Spark / GB10 cuda	9	678	March 20, 2026
Qwen/Qwen3.6-35B-A3B (and FP8) has landed DGX Spark / GB10 agentic-ai	110	9276	April 22, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	14911	March 24, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	4857	March 16, 2026
Qwen3.5 27B optimisation thread starting at 30+ t/s TP=1 DGX Spark / GB10 llama , agentic-ai	18	1191	April 16, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	8978	April 9, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	8669	March 24, 2026
Qwen3.5-397B-A17B run in dual spark! but I have a concern DGX Spark / GB10	229	6858	April 20, 2026
RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark DGX Spark / GB10 Projects llm	74	4671	April 11, 2026

Qwen3.5-35B-A3B optimizations on single Spark

Tool Call Failures

Drift

Related topics