Qwen/Qwen3.6-35B-A3B (and FP8) has landed

jbourny · April 29, 2026, 3:03pm

Hello,

I can’t understand why no one is saying that Qwen3.6 or vLLM isn’t stable at all. I’ve tried all the vLLM versions and almost all the Qwen3.6 35B models (FP8, INT4, NVFP4 with and without distillation), and they all have the same problem: Endless repetition during reflections, the model starts creating a large file, then once it’s finished, it decides, “Oh, actually, no, I don’t like it, I’ll do it differently,” and this can go on for several times. The same thing happens when it corrects a file; it will correct it multiple times. Sometimes it will even reflect, say something, use a tool, reflect, say the same thing again, use the same tool with the same parameters, and so on dozens of times, even indefinitely.

I tried every possible setting, starting with the recommended one. The last one that seemed stable was `{repetition_penalty:1.1,temperature:0.4}`, but ultimately, after a while, around 30k contexts, it keeps repeating.

This happens with or without `preserve_thinking`, whether in Claude Code, VS Code, or even custom-built assistants.

My second ongoing issue, whether it’s vLLM Nightly or even the latest stable version vLLM 0.20 and vLLM eugr (which is based on the nightly version), is the tool calls. The tools end up as XML in the thinking_content output, forcing me to patch them everywhere. Qwen has been releasing templates for two years, and no one has been able to get vLLM working out of the box with this fix. I don’t understand…

Could someone please create a ZIP file containing all the patches or commands needed to make it work properly? I’m starting to despair :(

Topic		Replies	Views
Qwen3.6-27B is out! DGX Spark / GB10 agentic-ai	170	14420	May 15, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	399	15318	May 10, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	15755	March 24, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	5371	March 16, 2026
Qwen3.5 27B optimisation thread starting at 30+ t/s TP=1 DGX Spark / GB10 llama , agentic-ai	23	2264	May 11, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	9356	March 24, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	10047	April 9, 2026
Qwen3.5-35B-A3B optimizations on single Spark DGX Spark / GB10 Projects	46	2390	May 4, 2026
Qwen3.6-27B-Dflash link DGX Spark / GB10 Projects	22	3131	April 29, 2026
Qwen3.5-397B-A17B run in dual spark! but I have a concern DGX Spark / GB10	230	7631	May 11, 2026

Qwen/Qwen3.6-35B-A3B (and FP8) has landed

Related topics