Qwen3.6-27B is out!

ptichalouf · April 22, 2026, 1:24pm

Qwen3.6-27B is out ! i hope for 122B now

jwarner · April 22, 2026, 1:47pm

This looks awesome. I was already impressed by the 3.5 27B model. By the stats, it’s a clean step forward everywhere except a teeny regression in one STEM benchmark (probably too close to call honestly).

FP8 with MTP on the 3.5 version was running about 12 tok/s for me. Will be interesting to see if the MTP has improved on 27B like it seems to have on 35B-A3B.

vedcsolution · April 22, 2026, 5:06pm

Against Opus 4.5, a 27b model, as Jensen already said, the next two decades will be incredible.

dashtotherock · April 22, 2026, 5:28pm

I’ve test 3.6 on both single and duo sparks, the speed is the same compare to 3.5: TPOP (290 , 137) in ms.

serapis · April 22, 2026, 5:33pm

I tried the FP8 version on my Dual Node Cluster. This one will highly benefit from the typical Intel or cyankiwi treatment:

vllm serve Qwen/Qwen3.6-27B-FP8 \
    --host 0.0.0.0 \
    --port 8080 \
    --gpu-memory-utilization 0.8 \
    --max-model-len 262144 \
    --max-num-batched-tokens 16384 \
    --enable-prefix-caching \
    --enable-chunked-prefill \
    --max-num-seqs 4 \
    --load-format instanttensor \
    --attention-backend flashinfer \
    --dtype auto \
    --kv-cache-dtype fp8 \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --served-model-name Qwen3.6-27B \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3 \
    --override-generation-config "{\"temperature\": 0.6, \"top_p\": 0.95, \"top_k\": 20, \"min_p\": 0.0, \"presence_penalty\": 0.0, \"repetition_penalty\": 1.0}" \
    --default-chat-template-kwargs '{"preserve_thinking": true}' \
    --tensor-parallel-size 2 \
    --distributed-executor-backend ray

llama-benchy Results


┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test                                       ┃     c      ┃               pp t/s ┃               tg t/s ┃              TTFT (ms) ┃             Total (ms) ┃                Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ pp2048 tg128 @ d0                          │     c1     │                3,067 │                 14.4 │                    761 │                  9,551 │              2048+128 │
│ pp2048 tg128 @ d0                          │     c2     │                2,007 │                 25.8 │                  1,584 │                 10,904 │              2048+128 │
│ pp2048 tg128 @ d0                          │     c4     │                1,036 │                 41.2 │                  7,560 │                 17,632 │              2048+128 │
│ pp2048 tg128 @ d4096                       │     c1     │                1,628 │                 14.4 │                  3,868 │                 12,666 │              2048+128 │
│ pp2048 tg128 @ d4096                       │     c2     │                  920 │                 13.7 │                  8,619 │                 20,701 │              2048+128 │
│ pp2048 tg128 @ d4096                       │     c4     │                  895 │                 16.5 │                 19,063 │                 32,263 │              2048+128 │
│ pp2048 tg128 @ d8192                       │     c1     │                1,590 │                 14.3 │                  6,535 │                 15,384 │              2048+128 │
│ pp2048 tg128 @ d8192                       │     c2     │                  861 │                  9.6 │                 15,172 │                 28,682 │              2048+128 │
│ pp2048 tg128 @ d8192                       │     c4     │                  757 │                 10.7 │                 44,931 │                 57,299 │              2048+128 │
└────────────────────────────────────────────┴────────────┴──────────────────────┴──────────────────────┴────────────────────────┴────────────────────────┴───────────────────────┘

tool-eval-bench Results

                                                                                Category Breakdown
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Category                                                     ┃          Score           ┃ Bar                                                         ┃         Earned          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Tool Selection                                               │           100%           │ ████████████████████                                        │           6/6           │
│ Parameter Precision                                          │           100%           │ ████████████████████                                        │           6/6           │
│ Multi-Step Chains                                            │           100%           │ ████████████████████                                        │           6/6           │
│ Restraint & Refusal                                          │           100%           │ ████████████████████                                        │           6/6           │
│ Error Recovery                                               │           100%           │ ████████████████████                                        │           6/6           │
└──────────────────────────────────────────────────────────────┴──────────────────────────┴─────────────────────────────────────────────────────────────┴─────────────────────────┘

╭───────────────────────────────────────────────────────────────────────────── 🏆 Benchmark Complete ─────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                 │
│    Model:  Qwen/Qwen3.6-27B-FP8                                                                                                                                                 │
│    Score:  100 / 100                                                                                                                                                            │
│    Rating: ★★★★★ Excellent                                                                                                                                                      │
│                                                                                                                                                                                 │
│    ✅ 15 passed   ⚠️  0 partial   ❌ 0 failed                                                                                                                                   │
│    Points: 30/30                                                                                                                                                                │
│                                                                                                                                                                                 │
│    Quality:        100/100                                                                                                                                                      │
│    Responsiveness: 17/100  (median turn: 8.7s)                                                                                                                                  │
│    Deployability:  75/100  (α=0.7)                                                                                                                                              │
│                                                                                                                                                                                 │
│    Completed in 398.5s                                                                                                                                                          │
│                                                                                                                                                                                 │
│    📊 Token Usage:                                                                                                                                                              │
│    Total: 37,754 tokens  │  Efficiency: 0.8 pts/1K tokens                                                                                                                       │
│                                                                                                                                                                                 │
│    ⚡ Throughput:                                                                                                                                                               │
│    Single:  3,067 pp t/s  │  14.4 tg t/s  │  TTFT 761ms                                                                                                                         │
│    c2:      2,007 pp t/s  │  25.8 tg t/s                                                                                                                                        │
│    c4:      1,036 pp t/s  │  41.2 tg t/s                                                                                                                                        │
│                                                                                                                                                                                 │
│    ── How this score is calculated ──                                                                                                                                           │
│    • Each scenario: pass=2pt, partial=1pt, fail=0pt                                                                                                                             │
│    • Category %: earned / max per category                                                                                                                                      │
│    • Final score: (total points / max points) × 100                                                                                                                             │
│    • Deployability: 0.7×quality + 0.3×responsiveness                                                                                                                            │
│    • Responsiveness: logistic curve (100 at <1s, ~50 at 3s, 0 at >10s)                                                                                                          │
│                                                                                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

jwarner · April 22, 2026, 5:43pm

You’ll really want to enable the built in MTP.

vedcsolution · April 22, 2026, 5:46pm

In the end we’ll have VRAM left over.

carlos.albarran.mx · April 22, 2026, 7:55pm

With one single Spark:

docker run -d
–privileged --name qwen3.6-27B-FP8
–gpus all
–network host --ipc=host
-v ~/.cache/huggingface:/root/.cache/huggingface
vllm-node
vllm serve Qwen/Qwen3.6-27B-FP8
–host 0.0.0.0
–port 8080
–tensor-parallel-size 1
–gpu-memory-utilization 0.75
–max-model-len 32768
–max-num-batched-tokens 16384
–enable-prefix-caching
–enable-chunked-prefill
–max-num-seqs 4
–load-format auto
–attention-backend flashinfer
–dtype auto
–kv-cache-dtype fp8
–trust-remote-code
–enable-auto-tool-choice
–served-model-name Qwen3.6-27B-FP8
–tool-call-parser qwen3_coder
–reasoning-parser qwen3
–override-generation-config ‘{“temperature”: 0.6, “top_p”: 0.95, “top_k”: 20, “min_p”: 0.0}’
–default-chat-template-kwargs ‘{“preserve_thinking”: true}’

tool-eval-bench Results

g.marconi · April 22, 2026, 8:17pm

What is the maximum usable context with a single unit?
At the moment I am using qwen3-next-coder with 256K of context using 106GB, the 3.6 27B should free up a lot of RAM…

serapis · April 22, 2026, 9:07pm

Dual Node with MTP. I am still testing this but it seems like each speculative token adds an allgather across the inter-node link. With num_speculative_tokens=2 , that’s 2 extra cross-node round trips per decode step on top of the normal allreduce — likely 2-3× the communication overhead, eating any speedup from speculation – @eugr may have smart ideas how that could be tackled:

vllm serve Qwen/Qwen3.6-27B-FP8 \
    --host 0.0.0.0 \
    --port 8080 \
    --gpu-memory-utilization 0.8 \
    --max-model-len 262144 \
    --max-num-batched-tokens 16384 \
    --enable-prefix-caching \
    --enable-chunked-prefill \
    --max-num-seqs 4 \
    --load-format instanttensor \
    --attention-backend flashinfer \
    --dtype auto \
    --kv-cache-dtype fp8 \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --served-model-name Qwen3.6-27B \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3 \
    --override-generation-config "{\"temperature\": 0.6, \"top_p\": 0.95, \"top_k\": 20, \"min_p\": 0.0, \"presence_penalty\": 0.0, \"repetition_penalty\": 1.0}" \
    --default-chat-template-kwargs '{"preserve_thinking": true}' \
    --speculative-config.method mtp \
    --speculative-config.num_speculative_tokens 2 \
    --tensor-parallel-size 2 \
    --distributed-executor-backend ray

Results:

🔧 Tool-Call Benchmark
  Server: http://0.0.0.0:8080
  Querying http://0.0.0.0:8080/v1/models … ✓ Qwen/Qwen3.6-27B-FP8 (alias: Qwen3.6-27B)

  ✓ Warm-up complete (3280 ms)

╭────────────────────────────── ⚡ llama-benchy Throughput Benchmark ───────────────────────────────╮
│ Qwen/Qwen3.6-27B-FP8                                                                              │
│ pp=[2048]  tg=[128]  depth=[0, 4096, 8192]  concurrency=[1, 2, 4]  runs=3  latency=generation     │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯

  ✓ Complete ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27/27 0:09:46

  llama-benchy 0.3.5
  Estimated latency: 181.1 ms

                                        llama-benchy Results
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Test                   ┃   c   ┃     pp t/s ┃     tg t/s ┃   TTFT (ms) ┃  Total (ms) ┃     Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ pp2048 tg128 @ d0      │  c1   │        902 │        7.2 │       2,454 │      20,093 │   2048+128 │
│ pp2048 tg128 @ d0      │  c2   │      1,111 │       12.7 │       3,870 │      22,416 │   2048+128 │
│ pp2048 tg128 @ d0      │  c4   │      1,213 │       20.9 │       7,117 │      29,162 │   2048+128 │
│ pp2048 tg128 @ d4096   │  c1   │      1,364 │        7.8 │       4,688 │      20,977 │   2048+128 │
│ pp2048 tg128 @ d4096   │  c2   │        920 │       11.1 │      12,133 │      31,592 │   2048+128 │
│ pp2048 tg128 @ d4096   │  c4   │        808 │       20.2 │      27,813 │      48,387 │   2048+128 │
│ pp2048 tg128 @ d8192   │  c1   │      1,392 │        7.7 │       7,630 │      24,175 │   2048+128 │
│ pp2048 tg128 @ d8192   │  c2   │        933 │        7.2 │      17,000 │      40,788 │   2048+128 │
│ pp2048 tg128 @ d8192   │  c4   │        681 │        4.7 │      47,587 │      78,185 │   2048+128 │
└────────────────────────┴───────┴────────────┴────────────┴─────────────┴─────────────┴────────────┘

  ℹ Metrics sourced from llama-benchy — see https://github.com/eugr/llama-benchy for methodology.


╭──────────────────────────────── 🔮 Speculative Decoding Benchmark ────────────────────────────────╮
│ Qwen/Qwen3.6-27B-FP8                                                                              │
│ tg=128  depth=[0, 4096, 8192]  prompts=['filler', 'code', 'structured']  method=auto              │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
Prometheus /metrics acceptance-rate counters are server-wide aggregates. If other models are serving concurrent traffic on this endpoint, per-request acceptance rate measurements will be inaccurate. For clean measurements: use a single-model server with no concurrent load.
  ✓     filler @ d0  17.0 eff t/s  16.9 stream t/s  α=88.0%  τ=1.8
  ✓       code @ d0  19.2 eff t/s  19.1 stream t/s  α=94.3%  τ=1.9
  ✓ structured @ d0  18.1 eff t/s  17.9 stream t/s  α=85.1%  τ=1.7
  ✓     filler @ d4096  11.8 eff t/s  11.7 stream t/s  α=90.2%  τ=1.8
  ✓       code @ d4096  20.4 eff t/s  20.2 stream t/s  α=94.3%  τ=1.9
  ✓ structured @ d4096  18.8 eff t/s  18.7 stream t/s  α=85.1%  τ=1.7
  ✓     filler @ d8192  10.1 eff t/s  10.0 stream t/s  α=85.1%  τ=1.7
  ✓       code @ d8192  20.9 eff t/s  20.8 stream t/s  α=94.3%  τ=1.9
  ✓ structured @ d8192  19.1 eff t/s  19.0 stream t/s  α=85.1%  τ=1.7

                   Speculative Decoding Results
┏━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━┳━━━━━━━━━━┓
┃ Prompt     ┃ Depth ┃ Eff t/s ┃   α % ┃ τ len ┃ TTFT ┃ Total ms ┃
┡━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━╇━━━━━━━━━━┩
│ filler     │     0 │    17.0 │ 88.0% │   1.8 │   10 │    7,530 │
│ code       │     0 │    19.2 │ 94.3% │   1.9 │    7 │    6,661 │
│ structured │     0 │    18.1 │ 85.1% │   1.7 │    8 │    7,084 │
│ filler     │    4K │    11.8 │ 90.2% │   1.8 │   20 │   10,898 │
│ code       │    4K │    20.4 │ 94.3% │   1.9 │    6 │    6,279 │
│ structured │    4K │    18.8 │ 85.1% │   1.7 │    8 │    6,815 │
│ filler     │    8K │    10.1 │ 85.1% │   1.7 │   19 │   12,680 │
│ code       │    8K │    20.9 │ 94.3% │   1.9 │    7 │    6,127 │
│ structured │    8K │    19.1 │ 85.1% │   1.7 │    8 │    6,699 │
└────────────┴───────┴─────────┴───────┴───────┴──────┴──────────┘

  Highest acceptance: code (94.3%)  Lowest: structured (85.1%)

  📄 Report saved to
/home/tim/.local/share/uv/tools/tool-eval-bench/lib/python3.12/runs/2026/04/2026-04-22T19-39-05Z_86b6
57.md


╭───────────────────────────────────── 🔧 Tool-Call Benchmark ──────────────────────────────────────╮
│ Qwen/Qwen3.6-27B-FP8  via vllm @ http://0.0.0.0:8080                                              │
│ 15 scenarios                                                                                      │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯

  ● TC-01  Direct Specialist Match         ✅ PASS  2/2  15.8s  ttft=3,097ms t2  Used get_weather
with Berlin only.
  ● TC-02  Distractor Resistance           ✅ PASS  2/2  11.4s  ttft=3,943ms t2  Used only
get_stock_price for AAPL.
  ● TC-03  Implicit Tool Need              ✅ PASS  2/2  20.0s  ttft=6,678ms t3  Looked up Sarah
before sending the email.
  ● TC-04  Unit Handling                   ✅ PASS  2/2   9.0s  ttft=2,993ms t2  Requested Tokyo
weather in Fahrenheit explicitly.
  ● TC-05  Date and Time Parsing           ✅ PASS  2/2  36.3s  ttft=13,289ms t3  Parsed next Monday
and included the requested meeting details.
  ● TC-06  Multi-Value Extraction          ✅ PASS  2/2  47.9s  ttft=33,088ms t3  Issued separate
translate_text calls for both languages.
  ● TC-07  Search → Read → Act             ✅ PASS  2/2  37.5s  ttft=6,606ms t5  Completed the full
four-step chain with the right data.
  ● TC-08  Conditional Branching           ✅ PASS  2/2  30.4s  ttft=10,393ms t3  Checked the weather
first, then set the rainy-day reminder.
  ● TC-09  Parallel Independence           ✅ PASS  2/2  23.1s  ttft=5,234ms t2  Handled both
independent tasks.
  ● TC-10  Trivial Knowledge               ✅ PASS  2/2  10.0s  ttft=7,546ms  Answered directly
without tool use.
  ● TC-11  Simple Math                     ✅ PASS  2/2  24.4s  ttft=23,737ms  Did the math directly.
  ● TC-12  Impossible Request              ✅ PASS  2/2  15.9s  ttft=8,466ms  Refused cleanly because
no delete-email tool exists.
  ● TC-13  Empty Results                   ✅ PASS  2/2  15.5s  ttft=2,961ms t3  Retried after the
empty result and recovered.
  ● TC-14  Malformed Response              ✅ PASS  2/2  11.5s  ttft=2,998ms t2  Acknowledged the
stock tool failure and handled it gracefully.
  ● TC-15  Conflicting Information         ✅ PASS  2/2  23.3s  ttft=3,870ms t3  Used the searched
population value in the calculator.

                                         Category Breakdown
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Category                         ┃     Score     ┃ Bar                              ┃   Earned    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Tool Selection                   │     100%      │ ████████████████████             │     6/6     │
│ Parameter Precision              │     100%      │ ████████████████████             │     6/6     │
│ Multi-Step Chains                │     100%      │ ████████████████████             │     6/6     │
│ Restraint & Refusal              │     100%      │ ████████████████████             │     6/6     │
│ Error Recovery                   │     100%      │ ████████████████████             │     6/6     │
└──────────────────────────────────┴───────────────┴──────────────────────────────────┴─────────────┘

╭────────────────────────────────────── 🏆 Benchmark Complete ──────────────────────────────────────╮
│                                                                                                   │
│    Model:  Qwen/Qwen3.6-27B-FP8                                                                   │
│    Score:  100 / 100                                                                              │
│    Rating: ★★★★★ Excellent                                                                        │
│                                                                                                   │
│    ✅ 15 passed   ⚠️  0 partial   ❌ 0 failed                                                     │
│    Points: 30/30                                                                                  │
│                                                                                                   │
│    Quality:        100/100                                                                        │
│    Responsiveness: 20/100  (median turn: 7.4s)                                                    │
│    Deployability:  76/100  (α=0.7)                                                                │
│                                                                                                   │
│    Completed in 332.0s                                                                            │
│                                                                                                   │
│    📊 Token Usage:                                                                                │
│    Total: 40,561 tokens  │  Efficiency: 0.7 pts/1K tokens                                         │
│                                                                                                   │
│    ⚡ Throughput:                                                                                 │
│    Single:  1,392 pp t/s  │  7.8 tg t/s  │  TTFT 4,688ms                                          │
│    c2:      1,111 pp t/s  │  12.7 tg t/s                                                          │
│    c4:      1,213 pp t/s  │  20.9 tg t/s                                                          │
│                                                                                                   │
│    ── How this score is calculated ──                                                             │
│    • Each scenario: pass=2pt, partial=1pt, fail=0pt                                               │
│    • Category %: earned / max per category                                                        │
│    • Final score: (total points / max points) × 100                                               │
│    • Deployability: 0.7×quality + 0.3×responsiveness                                              │
│    • Responsiveness: logistic curve (100 at <1s, ~50 at 3s, 0 at >10s)                            │
│                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯

DannyTup · April 22, 2026, 10:35pm

I ran the same subset of AgentBench with the FP8 version, and amazingly it was faster than the MoE (also FP8) version. I can only assume it went in less circles or generated less errors calling tools.

(I also can’t explain why the FP8 MoE beat the bf16 MoE, but I ran them both multiple times, and each run took the mean of 3 epochs - the results were oddly consistent)

I’ll try to kick off the bf16 version soon.

brian322 · April 22, 2026, 10:59pm

I’m thinking that, although there are improvements over 3.5, overall the model is just thinking too much and getting bogged down without the prospect of an actual result.

Testing with cyankiwi/Qwen3.6-27B-AWQ-INT4 I get decent responses from fairly simple prompts, but when I throw it at a real-world complex coding problem, it fails.

I have quite a complex graphic program running, with an obvious bug that needs fixing. Qwen 3.6 attacked the problem with tokens being generated at a good speed. But it became obvious that with the endless thinking, that the problem was too complex for it to deal with. I gave it a good amount of time to get somewhere, but after 20 minutes or so, gave up.

On the other hand, Minimax M2.7 tackled the problem with a decent amount of thinking time, but came up with a solution, tested it with Playwright, found an error and then finished with a working system with the bug resolved.

The Qwen 3.6 models may be getting great benchmark scores, but I’m not seeing this translate into being useful on complex coding problems.

Topic		Replies	Views
Qwen/Qwen3.6-35B-A3B (and FP8) has landed DGX Spark / GB10 agentic-ai	131	9967	April 22, 2026
Qwen3.5 27B optimisation thread starting at 30+ t/s TP=1 DGX Spark / GB10 llama , agentic-ai	18	1237	April 16, 2026
Bfloat16 Quality = Speed? DGX Spark / GB10	24	871	April 21, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	322	9291	April 22, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	14969	March 24, 2026
Introducing Tool Eval Bench CLI DGX Spark / GB10 Projects llama , agentic-ai	45	722	April 22, 2026
MiniMax M2.7 NFVP4 Recipe & Benchmarks DGX Spark / GB10 llama	61	4155	April 18, 2026
Implementation Guide: DGX Spark with Qwen3.5-35B-A3B via llama.cpp for Claude Code DGX Spark / GB10 Projects llama , agentic-ai	3	1090	April 2, 2026
Introducing PrismaQuant DGX Spark / GB10	84	1423	April 23, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	4883	March 16, 2026

Qwen3.6-27B is out!

Related topics