Qwen3.6-27B is out ! i hope for 122B now
This looks awesome. I was already impressed by the 3.5 27B model. By the stats, itโs a clean step forward everywhere except a teeny regression in one STEM benchmark (probably too close to call honestly).
FP8 with MTP on the 3.5 version was running about 12 tok/s for me. Will be interesting to see if the MTP has improved on 27B like it seems to have on 35B-A3B.
Against Opus 4.5, a 27b model, as Jensen already said, the next two decades will be incredible.
Iโve test 3.6 on both single and duo sparks, the speed is the same compare to 3.5: TPOP (290 , 137) in ms.
I tried the FP8 version on my Dual Node Cluster. This one will highly benefit from the typical Intel or cyankiwi treatment:
vllm serve Qwen/Qwen3.6-27B-FP8 \
--host 0.0.0.0 \
--port 8080 \
--gpu-memory-utilization 0.8 \
--max-model-len 262144 \
--max-num-batched-tokens 16384 \
--enable-prefix-caching \
--enable-chunked-prefill \
--max-num-seqs 4 \
--load-format instanttensor \
--attention-backend flashinfer \
--dtype auto \
--kv-cache-dtype fp8 \
--trust-remote-code \
--enable-auto-tool-choice \
--served-model-name Qwen3.6-27B \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--override-generation-config "{\"temperature\": 0.6, \"top_p\": 0.95, \"top_k\": 20, \"min_p\": 0.0, \"presence_penalty\": 0.0, \"repetition_penalty\": 1.0}" \
--default-chat-template-kwargs '{"preserve_thinking": true}' \
--tensor-parallel-size 2 \
--distributed-executor-backend ray
llama-benchy Results
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโ
โ Test โ c โ pp t/s โ tg t/s โ TTFT (ms) โ Total (ms) โ Tokens โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ pp2048 tg128 @ d0 โ c1 โ 3,067 โ 14.4 โ 761 โ 9,551 โ 2048+128 โ
โ pp2048 tg128 @ d0 โ c2 โ 2,007 โ 25.8 โ 1,584 โ 10,904 โ 2048+128 โ
โ pp2048 tg128 @ d0 โ c4 โ 1,036 โ 41.2 โ 7,560 โ 17,632 โ 2048+128 โ
โ pp2048 tg128 @ d4096 โ c1 โ 1,628 โ 14.4 โ 3,868 โ 12,666 โ 2048+128 โ
โ pp2048 tg128 @ d4096 โ c2 โ 920 โ 13.7 โ 8,619 โ 20,701 โ 2048+128 โ
โ pp2048 tg128 @ d4096 โ c4 โ 895 โ 16.5 โ 19,063 โ 32,263 โ 2048+128 โ
โ pp2048 tg128 @ d8192 โ c1 โ 1,590 โ 14.3 โ 6,535 โ 15,384 โ 2048+128 โ
โ pp2048 tg128 @ d8192 โ c2 โ 861 โ 9.6 โ 15,172 โ 28,682 โ 2048+128 โ
โ pp2048 tg128 @ d8192 โ c4 โ 757 โ 10.7 โ 44,931 โ 57,299 โ 2048+128 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโ
tool-eval-bench Results
Category Breakdown
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Category โ Score โ Bar โ Earned โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Tool Selection โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Parameter Precision โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Multi-Step Chains โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Restraint & Refusal โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Error Recovery โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ Benchmark Complete โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ Model: Qwen/Qwen3.6-27B-FP8 โ
โ Score: 100 / 100 โ
โ Rating: โ
โ
โ
โ
โ
Excellent โ
โ โ
โ โ
15 passed โ ๏ธ 0 partial โ 0 failed โ
โ Points: 30/30 โ
โ โ
โ Quality: 100/100 โ
โ Responsiveness: 17/100 (median turn: 8.7s) โ
โ Deployability: 75/100 (ฮฑ=0.7) โ
โ โ
โ Completed in 398.5s โ
โ โ
โ ๐ Token Usage: โ
โ Total: 37,754 tokens โ Efficiency: 0.8 pts/1K tokens โ
โ โ
โ โก Throughput: โ
โ Single: 3,067 pp t/s โ 14.4 tg t/s โ TTFT 761ms โ
โ c2: 2,007 pp t/s โ 25.8 tg t/s โ
โ c4: 1,036 pp t/s โ 41.2 tg t/s โ
โ โ
โ โโ How this score is calculated โโ โ
โ โข Each scenario: pass=2pt, partial=1pt, fail=0pt โ
โ โข Category %: earned / max per category โ
โ โข Final score: (total points / max points) ร 100 โ
โ โข Deployability: 0.7รquality + 0.3รresponsiveness โ
โ โข Responsiveness: logistic curve (100 at <1s, ~50 at 3s, 0 at >10s) โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Youโll really want to enable the built in MTP.
In the end weโll have VRAM left over.
With one single Spark:
docker run -d
โprivileged --name qwen3.6-27B-FP8
โgpus all
โnetwork host --ipc=host
-v ~/.cache/huggingface:/root/.cache/huggingface
vllm-node
vllm serve Qwen/Qwen3.6-27B-FP8
โhost 0.0.0.0
โport 8080
โtensor-parallel-size 1
โgpu-memory-utilization 0.75
โmax-model-len 32768
โmax-num-batched-tokens 16384
โenable-prefix-caching
โenable-chunked-prefill
โmax-num-seqs 4
โload-format auto
โattention-backend flashinfer
โdtype auto
โkv-cache-dtype fp8
โtrust-remote-code
โenable-auto-tool-choice
โserved-model-name Qwen3.6-27B-FP8
โtool-call-parser qwen3_coder
โreasoning-parser qwen3
โoverride-generation-config โ{โtemperatureโ: 0.6, โtop_pโ: 0.95, โtop_kโ: 20, โmin_pโ: 0.0}โ
โdefault-chat-template-kwargs โ{โpreserve_thinkingโ: true}โ
tool-eval-bench Results
What is the maximum usable context with a single unit?
At the moment I am using qwen3-next-coder with 256K of context using 106GB, the 3.6 27B should free up a lot of RAMโฆ
Dual Node with MTP. I am still testing this but it seems like each speculative token adds an allgather across the inter-node link. With num_speculative_tokens=2 , thatโs 2 extra cross-node round trips per decode step on top of the normal allreduce โ likely 2-3ร the communication overhead, eating any speedup from speculation โ @eugr may have smart ideas how that could be tackled:
vllm serve Qwen/Qwen3.6-27B-FP8 \
--host 0.0.0.0 \
--port 8080 \
--gpu-memory-utilization 0.8 \
--max-model-len 262144 \
--max-num-batched-tokens 16384 \
--enable-prefix-caching \
--enable-chunked-prefill \
--max-num-seqs 4 \
--load-format instanttensor \
--attention-backend flashinfer \
--dtype auto \
--kv-cache-dtype fp8 \
--trust-remote-code \
--enable-auto-tool-choice \
--served-model-name Qwen3.6-27B \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--override-generation-config "{\"temperature\": 0.6, \"top_p\": 0.95, \"top_k\": 20, \"min_p\": 0.0, \"presence_penalty\": 0.0, \"repetition_penalty\": 1.0}" \
--default-chat-template-kwargs '{"preserve_thinking": true}' \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 2 \
--tensor-parallel-size 2 \
--distributed-executor-backend ray
Results:
๐ง Tool-Call Benchmark
Server: http://0.0.0.0:8080
Querying http://0.0.0.0:8080/v1/models โฆ โ Qwen/Qwen3.6-27B-FP8 (alias: Qwen3.6-27B)
โ Warm-up complete (3280 ms)
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โก llama-benchy Throughput Benchmark โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Qwen/Qwen3.6-27B-FP8 โ
โ pp=[2048] tg=[128] depth=[0, 4096, 8192] concurrency=[1, 2, 4] runs=3 latency=generation โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โ Complete โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 27/27 0:09:46
llama-benchy 0.3.5
Estimated latency: 181.1 ms
llama-benchy Results
โโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโโโโโโณโโโโโโโโโโโโโณโโโโโโโโโโโโโโณโโโโโโโโโโโโโโณโโโโโโโโโโโโโ
โ Test โ c โ pp t/s โ tg t/s โ TTFT (ms) โ Total (ms) โ Tokens โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ pp2048 tg128 @ d0 โ c1 โ 902 โ 7.2 โ 2,454 โ 20,093 โ 2048+128 โ
โ pp2048 tg128 @ d0 โ c2 โ 1,111 โ 12.7 โ 3,870 โ 22,416 โ 2048+128 โ
โ pp2048 tg128 @ d0 โ c4 โ 1,213 โ 20.9 โ 7,117 โ 29,162 โ 2048+128 โ
โ pp2048 tg128 @ d4096 โ c1 โ 1,364 โ 7.8 โ 4,688 โ 20,977 โ 2048+128 โ
โ pp2048 tg128 @ d4096 โ c2 โ 920 โ 11.1 โ 12,133 โ 31,592 โ 2048+128 โ
โ pp2048 tg128 @ d4096 โ c4 โ 808 โ 20.2 โ 27,813 โ 48,387 โ 2048+128 โ
โ pp2048 tg128 @ d8192 โ c1 โ 1,392 โ 7.7 โ 7,630 โ 24,175 โ 2048+128 โ
โ pp2048 tg128 @ d8192 โ c2 โ 933 โ 7.2 โ 17,000 โ 40,788 โ 2048+128 โ
โ pp2048 tg128 @ d8192 โ c4 โ 681 โ 4.7 โ 47,587 โ 78,185 โ 2048+128 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโ
โน Metrics sourced from llama-benchy โ see https://github.com/eugr/llama-benchy for methodology.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ฎ Speculative Decoding Benchmark โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Qwen/Qwen3.6-27B-FP8 โ
โ tg=128 depth=[0, 4096, 8192] prompts=['filler', 'code', 'structured'] method=auto โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Prometheus /metrics acceptance-rate counters are server-wide aggregates. If other models are serving concurrent traffic on this endpoint, per-request acceptance rate measurements will be inaccurate. For clean measurements: use a single-model server with no concurrent load.
โ filler @ d0 17.0 eff t/s 16.9 stream t/s ฮฑ=88.0% ฯ=1.8
โ code @ d0 19.2 eff t/s 19.1 stream t/s ฮฑ=94.3% ฯ=1.9
โ structured @ d0 18.1 eff t/s 17.9 stream t/s ฮฑ=85.1% ฯ=1.7
โ filler @ d4096 11.8 eff t/s 11.7 stream t/s ฮฑ=90.2% ฯ=1.8
โ code @ d4096 20.4 eff t/s 20.2 stream t/s ฮฑ=94.3% ฯ=1.9
โ structured @ d4096 18.8 eff t/s 18.7 stream t/s ฮฑ=85.1% ฯ=1.7
โ filler @ d8192 10.1 eff t/s 10.0 stream t/s ฮฑ=85.1% ฯ=1.7
โ code @ d8192 20.9 eff t/s 20.8 stream t/s ฮฑ=94.3% ฯ=1.9
โ structured @ d8192 19.1 eff t/s 19.0 stream t/s ฮฑ=85.1% ฯ=1.7
Speculative Decoding Results
โโโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโโโณโโโโโโโโณโโโโโโโโณโโโโโโโณโโโโโโโโโโโ
โ Prompt โ Depth โ Eff t/s โ ฮฑ % โ ฯ len โ TTFT โ Total ms โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ filler โ 0 โ 17.0 โ 88.0% โ 1.8 โ 10 โ 7,530 โ
โ code โ 0 โ 19.2 โ 94.3% โ 1.9 โ 7 โ 6,661 โ
โ structured โ 0 โ 18.1 โ 85.1% โ 1.7 โ 8 โ 7,084 โ
โ filler โ 4K โ 11.8 โ 90.2% โ 1.8 โ 20 โ 10,898 โ
โ code โ 4K โ 20.4 โ 94.3% โ 1.9 โ 6 โ 6,279 โ
โ structured โ 4K โ 18.8 โ 85.1% โ 1.7 โ 8 โ 6,815 โ
โ filler โ 8K โ 10.1 โ 85.1% โ 1.7 โ 19 โ 12,680 โ
โ code โ 8K โ 20.9 โ 94.3% โ 1.9 โ 7 โ 6,127 โ
โ structured โ 8K โ 19.1 โ 85.1% โ 1.7 โ 8 โ 6,699 โ
โโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโโดโโโโโโโโดโโโโโโโโดโโโโโโโดโโโโโโโโโโโ
Highest acceptance: code (94.3%) Lowest: structured (85.1%)
๐ Report saved to
/home/tim/.local/share/uv/tools/tool-eval-bench/lib/python3.12/runs/2026/04/2026-04-22T19-39-05Z_86b6
57.md
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ง Tool-Call Benchmark โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Qwen/Qwen3.6-27B-FP8 via vllm @ http://0.0.0.0:8080 โ
โ 15 scenarios โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โ TC-01 Direct Specialist Match โ
PASS 2/2 15.8s ttft=3,097ms t2 Used get_weather
with Berlin only.
โ TC-02 Distractor Resistance โ
PASS 2/2 11.4s ttft=3,943ms t2 Used only
get_stock_price for AAPL.
โ TC-03 Implicit Tool Need โ
PASS 2/2 20.0s ttft=6,678ms t3 Looked up Sarah
before sending the email.
โ TC-04 Unit Handling โ
PASS 2/2 9.0s ttft=2,993ms t2 Requested Tokyo
weather in Fahrenheit explicitly.
โ TC-05 Date and Time Parsing โ
PASS 2/2 36.3s ttft=13,289ms t3 Parsed next Monday
and included the requested meeting details.
โ TC-06 Multi-Value Extraction โ
PASS 2/2 47.9s ttft=33,088ms t3 Issued separate
translate_text calls for both languages.
โ TC-07 Search โ Read โ Act โ
PASS 2/2 37.5s ttft=6,606ms t5 Completed the full
four-step chain with the right data.
โ TC-08 Conditional Branching โ
PASS 2/2 30.4s ttft=10,393ms t3 Checked the weather
first, then set the rainy-day reminder.
โ TC-09 Parallel Independence โ
PASS 2/2 23.1s ttft=5,234ms t2 Handled both
independent tasks.
โ TC-10 Trivial Knowledge โ
PASS 2/2 10.0s ttft=7,546ms Answered directly
without tool use.
โ TC-11 Simple Math โ
PASS 2/2 24.4s ttft=23,737ms Did the math directly.
โ TC-12 Impossible Request โ
PASS 2/2 15.9s ttft=8,466ms Refused cleanly because
no delete-email tool exists.
โ TC-13 Empty Results โ
PASS 2/2 15.5s ttft=2,961ms t3 Retried after the
empty result and recovered.
โ TC-14 Malformed Response โ
PASS 2/2 11.5s ttft=2,998ms t2 Acknowledged the
stock tool failure and handled it gracefully.
โ TC-15 Conflicting Information โ
PASS 2/2 23.3s ttft=3,870ms t3 Used the searched
population value in the calculator.
Category Breakdown
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโ
โ Category โ Score โ Bar โ Earned โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Tool Selection โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Parameter Precision โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Multi-Step Chains โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Restraint & Refusal โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โ Error Recovery โ 100% โ โโโโโโโโโโโโโโโโโโโโ โ 6/6 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ Benchmark Complete โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ Model: Qwen/Qwen3.6-27B-FP8 โ
โ Score: 100 / 100 โ
โ Rating: โ
โ
โ
โ
โ
Excellent โ
โ โ
โ โ
15 passed โ ๏ธ 0 partial โ 0 failed โ
โ Points: 30/30 โ
โ โ
โ Quality: 100/100 โ
โ Responsiveness: 20/100 (median turn: 7.4s) โ
โ Deployability: 76/100 (ฮฑ=0.7) โ
โ โ
โ Completed in 332.0s โ
โ โ
โ ๐ Token Usage: โ
โ Total: 40,561 tokens โ Efficiency: 0.7 pts/1K tokens โ
โ โ
โ โก Throughput: โ
โ Single: 1,392 pp t/s โ 7.8 tg t/s โ TTFT 4,688ms โ
โ c2: 1,111 pp t/s โ 12.7 tg t/s โ
โ c4: 1,213 pp t/s โ 20.9 tg t/s โ
โ โ
โ โโ How this score is calculated โโ โ
โ โข Each scenario: pass=2pt, partial=1pt, fail=0pt โ
โ โข Category %: earned / max per category โ
โ โข Final score: (total points / max points) ร 100 โ
โ โข Deployability: 0.7รquality + 0.3รresponsiveness โ
โ โข Responsiveness: logistic curve (100 at <1s, ~50 at 3s, 0 at >10s) โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
I ran the same subset of AgentBench with the FP8 version, and amazingly it was faster than the MoE (also FP8) version. I can only assume it went in less circles or generated less errors calling tools.
(I also canโt explain why the FP8 MoE beat the bf16 MoE, but I ran them both multiple times, and each run took the mean of 3 epochs - the results were oddly consistent)
Iโll try to kick off the bf16 version soon.
Iโm thinking that, although there are improvements over 3.5, overall the model is just thinking too much and getting bogged down without the prospect of an actual result.
Testing with cyankiwi/Qwen3.6-27B-AWQ-INT4 I get decent responses from fairly simple prompts, but when I throw it at a real-world complex coding problem, it fails.
I have quite a complex graphic program running, with an obvious bug that needs fixing. Qwen 3.6 attacked the problem with tokens being generated at a good speed. But it became obvious that with the endless thinking, that the problem was too complex for it to deal with. I gave it a good amount of time to get somewhere, but after 20 minutes or so, gave up.
On the other hand, Minimax M2.7 tackled the problem with a decent amount of thinking time, but came up with a solution, tested it with Playwright, found an error and then finished with a working system with the bug resolved.
The Qwen 3.6 models may be getting great benchmark scores, but Iโm not seeing this translate into being useful on complex coding problems.

