I compared the CUDA and Vulkan backends in llama.cpp, and overall it doesn’t look that bad.
Yes, pp is always lower, but tg is at roughly the same level in most tests.
So the question is whether the CUDA implementation isn’t that good, or Vulkan actually isn’t that bad.
Qwen3-Coder-Next-UD-Q6_K_XL
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
pp512 |
1067.77 ± 6.33 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
tg128 |
36.48 ± 0.05 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
1062.46 ± 8.63 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
36.80 ± 0.05 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
pp512 |
931.11 ± 18.88 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
tg128 |
37.43 ± 0.25 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
938.06 ± 22.53 |
| qwen3next 80B.A3B Q6_K |
68.09 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
37.71 ± 0.25 |
Qwen3-Coder-Next-GGUF_Qwen3-Coder-Next-MXFP4_MOE
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
pp512 |
1441.97 ± 14.83 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
tg128 |
48.58 ± 0.13 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
1461.08 ± 17.49 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
48.91 ± 0.10 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
pp512 |
1147.19 ± 4.43 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
tg128 |
44.45 ± 0.37 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
1159.12 ± 24.79 |
| qwen3next 80B.A3B MXFP4 MoE |
40.73 GiB |
79.67 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
43.07 ± 0.50 |
Qwen3.5-122B-A10B-GGUF_UD-Q5_K_XL
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
pp512 |
645.58 ± 3.95 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
tg128 |
20.68 ± 0.02 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
663.31 ± 3.85 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
20.79 ± 0.02 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
pp512 |
471.86 ± 2.16 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
tg128 |
18.20 ± 0.02 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
473.46 ± 4.02 |
| qwen35moe 122B.A10B Q5_K - Medium |
85.60 GiB |
122.11 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
18.54 ± 0.07 |
NVIDIA-Nemotron-3-Super-120B-A12B-GGUF_UD-Q4_K_XL
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
pp512 |
524.26 ± 3.20 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
tg128 |
16.13 ± 0.01 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
525.09 ± 1.11 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
16.13 ± 0.01 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
pp512 |
359.69 ± 3.03 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
tg128 |
5.33 ± 0.02 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
359.10 ± 5.42 |
| nemotron_h_moe 120B.A12B Q4_K - Medium |
78.02 GiB |
120.67 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
5.36 ± 0.03 |
GLM-4.7-Flash-GGUF_GLM-4.7-Flash-MXFP4_MOE
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
pp512 |
2314.64 ± 68.49 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
tg128 |
57.06 ± 0.15 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
2565.49 ± 28.21 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
59.53 ± 0.14 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
pp512 |
1656.79 ± 13.19 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
tg128 |
56.87 ± 2.47 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
1658.20 ± 25.39 |
| deepseek2 30B.A3B MXFP4 MoE |
15.79 GiB |
29.94 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
55.87 ± 0.24 |
gpt-oss-20b
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
pp512 |
3088.68 ± 47.77 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
0 |
CUDA0 |
0 |
tg128 |
66.65 ± 0.19 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
4005.98 ± 94.33 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
69.11 ± 0.14 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
pp512 |
1778.36 ± 54.59 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
0 |
Vulkan0 |
0 |
tg128 |
57.64 ± 1.51 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
2070.19 ± 19.01 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
56.78 ± 0.50 |
Also env variable GGML_VK_PREFER_HOST_MEMORY=1 can add 10-15% in tg
with GGML_VK_PREFER_HOST_MEMORY=1
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
4110.87 ± 84.29 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
71.30 ± 0.09 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d10000 |
3558.36 ± 68.06 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d10000 |
64.76 ± 0.26 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d65000 |
1828.87 ± 26.65 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d65000 |
44.40 ± 0.01 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d100000 |
1381.22 ± 13.35 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d100000 |
37.17 ± 0.04 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
2097.93 ± 51.80 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
66.88 ± 0.20 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d10000 |
1732.30 ± 24.81 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d10000 |
62.21 ± 0.42 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d65000 |
827.85 ± 7.20 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d65000 |
45.51 ± 0.11 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d100000 |
615.54 ± 2.85 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d100000 |
38.72 ± 0.04 |
without GGML_VK_PREFER_HOST_MEMORY=1
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
4074.91 ± 93.79 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
71.73 ± 0.13 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d10000 |
3585.54 ± 87.08 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d10000 |
65.13 ± 0.11 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d65000 |
1811.16 ± 34.05 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d65000 |
44.49 ± 0.01 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d100000 |
1382.26 ± 16.47 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d100000 |
37.10 ± 0.03 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
2058.68 ± 30.76 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
60.06 ± 4.24 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d10000 |
1712.20 ± 19.11 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d10000 |
55.21 ± 0.87 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d65000 |
825.18 ± 4.88 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d65000 |
40.92 ± 1.42 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d100000 |
614.77 ± 1.39 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d100000 |
35.07 ± 0.68 |
It’s the latter.
Vulkan is not only a solid compute platform but there is also support for “cooperative matrix” operations (e.g. tensor cores / wmma / etc. on NVIDIA).
By default with headers form repo we i have
ggml_vulkan: 0 = NVIDIA GB10 (NVIDIA) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 0 | matrix cores: KHR_coopmat
With vulkan sdk srom source and
ggml_vulkan: 0 = NVIDIA GB10 (NVIDIA) | uma: 1 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
And it’s add +10-15% in tg
| model |
size |
params |
backend |
ngl |
fa |
dev |
mmap |
test |
t/s |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 |
4155.66 ± 54.19 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 |
73.20 ± 0.10 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
pp512 @ d10000 |
3646.81 ± 55.16 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
CUDA0 |
0 |
tg128 @ d10000 |
66.57 ± 0.13 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 |
2764.43 ± 28.25 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 |
68.00 ± 0.26 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
pp512 @ d10000 |
2533.45 ± 13.34 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
1 |
Vulkan0 |
0 |
tg128 @ d10000 |
63.71 ± 0.28 |
With last llama.cpp build vulkan can outperform cuda for qwen3 models
unsloth/Qwen3.5-35B-A3B-GGUF:Q6_K
| model |
size |
params |
backend |
ngl |
n_batch |
fa |
dev |
mmap |
test |
t/s |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 |
1834.98 ± 5.66 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 |
59.77 ± 0.07 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 @ d10000 |
1706.84 ± 2.01 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 @ d10000 |
55.63 ± 0.07 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 @ d64000 |
1350.14 ± 4.41 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 @ d64000 |
42.87 ± 0.04 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 |
1915.37 ± 19.92 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 |
59.89 ± 0.01 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 @ d10000 |
1744.88 ± 6.58 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 @ d10000 |
56.47 ± 0.06 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 @ d64000 |
1269.11 ± 1.66 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 @ d64000 |
44.73 ± 0.01 |
unsloth/Qwen3-Coder-Next-GGUF:UD-Q6_K_XL
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | CUDA0 | 0 | pp2048 | 1087.56 ± 2.79 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | CUDA0 | 0 | tg256 | 37.58 ± 0.02 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | CUDA0 | 0 | pp2048 @ d10000 | 1052.41 ± 3.55 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | CUDA0 | 0 | tg256 @ d10000 | 35.46 ± 0.03 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | CUDA0 | 0 | pp2048 @ d64000 | 899.92 ± 1.39 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | CUDA0 | 0 | tg256 @ d64000 | 28.56 ± 0.30 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | Vulkan0 | 0 | pp2048 | 1221.80 ± 11.76 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | Vulkan0 | 0 | tg256 | 44.29 ± 0.06 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | Vulkan0 | 0 | pp2048 @ d10000 | 1130.06 ± 2.44 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | Vulkan0 | 0 | tg256 @ d10000 | 42.06 ± 0.04 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | Vulkan0 | 0 | pp2048 @ d64000 | 887.88 ± 3.36 |
| qwen3next 80B.A3B Q6_K | 68.09 GiB | 79.67 B | CUDA,Vulkan | 99 | 512 | 1 | Vulkan0 | 0 | tg256 @ d64000 | 33.82 ± 0.01 |
unsloth/Qwen3.5-35B-A3B-GGUF:Q6_K
| model |
size |
params |
backend |
ngl |
n_batch |
fa |
dev |
mmap |
test |
t/s |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 |
1819.41 ± 4.44 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 |
59.67 ± 0.07 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 @ d10000 |
1683.18 ± 7.23 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 @ d10000 |
55.35 ± 0.04 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 @ d64000 |
1363.08 ± 1.90 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 @ d64000 |
42.83 ± 0.04 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 |
1931.55 ± 18.43 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 |
59.35 ± 0.02 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 @ d10000 |
1736.51 ± 5.05 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 @ d10000 |
55.95 ± 0.04 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 @ d64000 |
1277.84 ± 4.16 |
| qwen35moe 35B.A3B Q6_K |
26.86 GiB |
34.66 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 @ d64000 |
44.40 ± 0.03 |
but for gpt-oss 20b it is the same
| model |
size |
params |
backend |
ngl |
n_batch |
fa |
dev |
mmap |
test |
t/s |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 |
4180.38 ± 14.68 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 |
72.29 ± 0.11 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 @ d10000 |
3592.74 ± 9.69 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 @ d10000 |
65.74 ± 0.22 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
pp2048 @ d64000 |
2009.12 ± 3.31 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
CUDA0 |
0 |
tg256 @ d64000 |
46.61 ± 0.03 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 |
2779.13 ± 15.28 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 |
67.07 ± 0.19 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 @ d10000 |
2538.70 ± 11.19 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 @ d10000 |
62.31 ± 0.10 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
pp2048 @ d64000 |
1705.03 ± 1.16 |
| gpt-oss 20B MXFP4 MoE |
11.77 GiB |
20.91 B |
CUDA,Vulkan |
99 |
512 |
1 |
Vulkan0 |
0 |
tg256 @ d64000 |
46.79 ± 0.06 |
There is also a difference in tool-eval-bench with the same model and the same settings.
CUDA score 95
Vulkan score 96
uvx tool-eval-bench --compare 2026-04-23T11-25-05Z_e58296 2026-04-23T11-09-34Z_e58296
╭─────────────────────────────────────────────────────────────────────────────────── 📊 Run Comparison ────────────────────────────────────────────────────────────────────────────────────╮
│ A (baseline): 2026-04-23T11-25-05Z_e58296 model=unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q6_K_XL │
│ tool-eval-bench: v1.4.0 │
│ Engine: llama.cpp b8901-8635e221c │
│ Quantization: Q6_K_X │
│ Temperature: 0.2 │
│ Seed: 42 │
│ Thinking: enabled │
│ Host: gx10-8990 │
│ B (current): 2026-04-23T11-09-34Z_e58296 model=unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q6_K_XL │
│ tool-eval-bench: v1.4.0 │
│ Engine: llama.cpp b8901-8635e221c │
│ Quantization: Q6_K_X │
│ Temperature: 0.2 │
│ Seed: 42 │
│ Thinking: enabled │
│ Host: gx10-8990 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
┏━━━━━━━━┳━━━━━━━━━━┳━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ ID ┃ A ┃ → ┃ B ┃ Δ ┃ Time Δ ┃ Note ┃
┡━━━━━━━━╇━━━━━━━━━━╇━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ TC-01 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.8s │ │
│ TC-02 │ ✅ 2 │ → │ ✅ 2 │ = │ +4.6s │ │
│ TC-03 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.8s │ │
│ TC-04 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.3s │ │
│ TC-05 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.2s │ │
│ TC-06 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.6s │ │
│ TC-07 │ ✅ 2 │ → │ ✅ 2 │ = │ +2.0s │ │
│ TC-08 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.2s │ │
│ TC-09 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.2s │ │
│ TC-10 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.2s │ │
│ TC-11 │ ✅ 2 │ → │ ✅ 2 │ = │ +3.0s │ │
│ TC-12 │ ✅ 2 │ → │ ✅ 2 │ = │ -0.4s │ │
│ TC-13 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.7s │ │
│ TC-14 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.4s │ │
│ TC-15 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.4s │ │
│ TC-16 │ ✅ 2 │ → │ ✅ 2 │ = │ -3.7s │ │
│ TC-17 │ ✅ 2 │ → │ ✅ 2 │ = │ +15.5s │ │
│ TC-18 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.5s │ │
│ TC-19 │ ✅ 2 │ → │ ✅ 2 │ = │ +2.3s │ │
│ TC-20 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.1s │ │
│ TC-21 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.4s │ │
│ TC-22 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.4s │ │
│ TC-23 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.9s │ │
│ TC-24 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.7s │ │
│ TC-25 │ ✅ 2 │ → │ ✅ 2 │ = │ -0.2s │ │
│ TC-26 │ ✅ 2 │ → │ ✅ 2 │ = │ +3.0s │ │
│ TC-27 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.5s │ │
│ TC-28 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.9s │ │
│ TC-29 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.7s │ │
│ TC-30 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.0s │ │
│ TC-31 │ ✅ 2 │ → │ ✅ 2 │ = │ +2.4s │ │
│ TC-32 │ ✅ 2 │ → │ ✅ 2 │ = │ +2.0s │ │
│ TC-33 │ ✅ 2 │ → │ ✅ 2 │ = │ -3.2s │ │
│ TC-34 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.7s │ │
│ TC-35 │ ⚠ 1 │ → │ ⚠ 1 │ = │ +13.5s │ │
│ TC-36 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.5s │ │
│ TC-37 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.9s │ │
│ TC-38 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.7s │ │
│ TC-39 │ ⚠ 1 │ → │ ⚠ 1 │ = │ +0.4s │ │
│ TC-40 │ ✅ 2 │ → │ ✅ 2 │ = │ +3.5s │ │
│ TC-41 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.9s │ │
│ TC-42 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.3s │ │
│ TC-43 │ ✅ 2 │ → │ ✅ 2 │ = │ -1.3s │ │
│ TC-44 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.2s │ │
│ TC-45 │ ✅ 2 │ → │ ✅ 2 │ = │ +16.2s │ │
│ TC-46 │ ⚠ 1 │ → │ ⚠ 1 │ = │ +4.7s │ │
│ TC-47 │ ⚠ 1 │ → │ ⚠ 1 │ = │ +14.1s │ │
│ TC-48 │ ✅ 2 │ → │ ✅ 2 │ = │ +13.9s │ │
│ TC-49 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.3s │ │
│ TC-50 │ ✅ 2 │ → │ ✅ 2 │ = │ +12.5s │ │
│ TC-51 │ ⚠ 1 │ → │ ✅ 2 │ +1 │ +2.9s │ improved │
│ TC-52 │ ✅ 2 │ → │ ✅ 2 │ = │ -1.6s │ │
│ TC-53 │ ✅ 2 │ → │ ✅ 2 │ = │ -0.4s │ │
│ TC-54 │ ✅ 2 │ → │ ✅ 2 │ = │ +2.9s │ │
│ TC-55 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.4s │ │
│ TC-56 │ ⚠ 1 │ → │ ⚠ 1 │ = │ +1.8s │ │
│ TC-57 │ ✅ 2 │ → │ ✅ 2 │ = │ +3.1s │ │
│ TC-58 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.2s │ │
│ TC-59 │ ✅ 2 │ → │ ✅ 2 │ = │ +3.4s │ │
│ TC-60 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.4s │ │
│ TC-61 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.8s │ │
│ TC-62 │ ⚠ 1 │ → │ ✅ 2 │ +1 │ +10.2s │ improved │
│ TC-63 │ ✅ 2 │ → │ ✅ 2 │ = │ -0.3s │ │
│ TC-64 │ ✅ 2 │ → │ ✅ 2 │ = │ +2.7s │ │
│ TC-65 │ ✅ 2 │ → │ ✅ 2 │ = │ +0.8s │ │
│ TC-66 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.1s │ │
│ TC-67 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.3s │ │
│ TC-68 │ ✅ 2 │ → │ ✅ 2 │ = │ +3.2s │ │
│ TC-69 │ ✅ 2 │ → │ ✅ 2 │ = │ +1.8s │ │
└────────┴──────────┴─────┴──────────┴────────┴──────────┴──────────┘
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ↑ 2 improved ↓ 0 regressed = 67 unchanged │
│ Points: 131 → 133 (+2) │
│ Score: 95 → 96 │
╰───────────────────────────────────────────────────────────