Well, well, well, have a look at that:
eugr@spark:~$ fastfetch
.',;::::;,'. eugr@spark
.';:cccccccccccc:;,. ----------
.;cccccccccccccccccccccc;. OS: Fedora Linux 43 (KDE Plasma Desktop Edition) aarch64
.:cccccccccccccccccccccccccc:. Host: NVIDIA_DGX_Spark (A.7)
.;ccccccccccccc;.:dddl:.;ccccccc;. Kernel: Linux 6.17.1-300.fc43.aarch64
.:ccccccccccccc;OWMKOOXMWd;ccccccc:. Uptime: 22 mins
.:ccccccccccccc;KMMc;cc;xMMc;ccccccc:. Packages: 2421 (rpm)
,cccccccccccccc;MMM.;cc;;WW:;cccccccc, Shell: bash 5.3.0
:cccccccccccccc;MMM.;cccccccccccccccc: Display (Unknown-1): 800x600 @ 60 Hz in 10"
:ccccccc;oxOOOo;MMM000k.;cccccccccccc: DE: KDE Plasma 6.4.5
cccccc;0MMKxdd:;MMMkddc.;cccccccccccc; WM: KWin (Wayland)
ccccc;XMO';cccc;MMM.;cccccccccccccccc' WM Theme: Breeze
ccccc;MMo;ccccc;MMW.;ccccccccccccccc; Theme: Breeze (Light) [Qt], Breeze [GTK2/3]
ccccc;0MNc.ccc.xMMd;ccccccccccccccc; Icons: Breeze [Qt], breeze [GTK2/3/4]
cccccc;dNMWXXXWM0:;cccccccccccccc:, Font: Noto Sans (10pt) [Qt], Noto Sans (10pt) [GTK2/3/4]
cccccccc;.:odl:.;cccccccccccccc:,. Cursor: Breeze (24px)
ccccccccccccccccccccccccccccc:'. Terminal: /dev/pts/4
:ccccccccccccccccccccccc:;,.. CPU: Cortex-A725*5 + Cortex-X925*5 + Cortex-A725*5 + Cortex-X925*5 (20) @ 3.90 GHz
':cccccccccccccccc::;,. GPU: NVIDIA Device 2E12 (VGA compatible)
Memory: 4.37 GiB / 119.69 GiB (4%)
Swap: 0 B / 8.00 GiB (0%)
Disk (/): 20.17 GiB / 538.30 GiB (4%) - btrfs
Local IP (enP7s7): 192.168.24.104/24
Locale: en_US.UTF-8
eugr@spark:~$ nvidia-smi
Thu Oct 23 13:07:12 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB10 Off | 0000000F:01:00.0 Off | N/A |
| N/A 38C P8 3W / N/A | Not Supported | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
So, looks like it’s halfway there, but GUI doesn’t want to go above 800x600 resolution.
But if you are not going to use it as a desktop, doesn’t matter.
The most important test is this:
eugr@spark:~/llama.cpp$ build/bin/llama-cli --list-devices
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes
Available devices:
CUDA0: NVIDIA GB10 (122558 MiB, 117541 MiB free)
Getting worse token generation performance than on stock DGX OS, but model loading time improved significantly:
eugr@spark:~/llama.cpp$ build/bin/llama-bench -m /run/media/eugr/root/home/eugr/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -d 0,4096,8192,16384,32768 -p 2048 -n 32 -ub 2048 -mmp 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
|---|---|---|---|---|---|---|---|---|---|
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | pp2048 | 1864.44 ± 3.08 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | tg32 | 41.79 ± 0.13 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | pp2048 @ d4096 | 1730.84 ± 4.07 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | tg32 @ d4096 | 37.90 ± 0.04 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | pp2048 @ d8192 | 1628.49 ± 7.19 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | tg32 @ d8192 | 36.38 ± 0.10 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 1395.37 ± 8.78 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | CUDA | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 34.23 ± 0.01 |