Using CUDA 13.2 on the DGX Spark

On the DGX Spark, you can install a newer CUDA toolkit (as in cuda-toolkit-13-2) package and the cuda-compat-13-2 compatibility package and then:

export LD_LIBRARY_PATH=/usr/local/cuda-13.2/compat

and then you’ll have access to a new CUDA user-mode driver without updating the whole system/driver set.

Note that OpenGL/Vulkan interop is broken in that setup, as described in Forward Compatibility — CUDA Compatibility but CUDA works great.

Alternatively, if you don’t rely on runtime PTX compilation, Minor Version Compatibility — CUDA Compatibility exists too, without needing to rely on the compat package.

3 Likes

Now we have nvidia image with cuda13.2

I build llama.cpp with 13.1.1 and now i can compare the same llama.cpp build with 13.1.1 and 13.2

docker run -it -v /home/pont/.cache/llama.cpp:/root/.cache/llama.cpp --gpus=all ghcr.io/pontostroy/llama.cpp:full-cuda13 --bench --model /root/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_Qwen3-Coder-Next-MXFP4_MOE.gguf -fa 1 --mmap 0 -d 0,16000,64000

13.1.1

model size params backend ngl fa mmap test t/s
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 pp512 4106.71 ± 43.47
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 tg128 70.80 ± 0.10
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 pp512 @ d16000 3154.20 ± 4.91
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 tg128 @ d16000 61.56 ± 0.14
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 pp512 @ d64000 1764.61 ± 10.73
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 tg128 @ d64000 45.20 ± 0.04
model size params backend ngl fa mmap test t/s
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 pp512 597.03 ± 2.77
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 tg128 20.42 ± 0.01
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 pp512 @ d16000 587.79 ± 1.99
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 tg128 @ d16000 20.11 ± 0.03
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 pp512 @ d64000 546.69 ± 7.41
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 tg128 @ d64000 19.20 ± 0.03
model size params backend ngl fa mmap test t/s
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 pp512 1488.48 ± 5.17
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 tg128 50.16 ± 0.09
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 pp512 @ d16000 1373.83 ± 4.78
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 tg128 @ d16000 45.05 ± 0.30
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 pp512 @ d64000 1142.44 ± 9.48
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 tg128 @ d64000 35.38 ± 0.19

13.2

model size params backend ngl fa mmap test t/s
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 pp512 4223.89 ± 50.50
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 tg128 72.01 ± 0.05
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 pp512 @ d16000 3207.67 ± 18.94
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 tg128 @ d16000 61.52 ± 0.19
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 pp512 @ d64000 1953.58 ± 18.24
gpt-oss 20B MXFP4 MoE 11.77 GiB 20.91 B CUDA 99 1 0 tg128 @ d64000 45.75 ± 0.03
model size params backend ngl fa mmap test t/s
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 pp512 600.68 ± 2.50
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 tg128 20.84 ± 0.02
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 pp512 @ d16000 588.74 ± 1.05
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 tg128 @ d16000 20.20 ± 0.05
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 pp512 @ d64000 551.32 ± 2.63
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB 120.67 B CUDA 99 1 0 tg128 @ d64000 19.50 ± 0.03
model size params backend ngl fa mmap test t/s
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 pp512 1480.93 ± 7.48
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 tg128 52.38 ± 0.16
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 pp512 @ d16000 1382.59 ± 11.23
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 tg128 @ d16000 45.19 ± 0.28
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 pp512 @ d64000 1149.42 ± 11.09
qwen3next 80B.A3B MXFP4 MoE 40.73 GiB 79.67 B CUDA 99 1 0 tg128 @ d64000 36.61 ± 0.04
3 Likes