What to run on 8 Sparks / GB10?

ma.bu · May 14, 2026, 2:31pm

Weekend incoming. 8 Sparks (GB10 - 4x Lenovo, 4x Asus) lined up as a cluster. Looking for ideas on what to run. Any specific models you want to see benchmarks for?

PS: Yes I know it´s a little bit messy ;) But I promise, will be looking good soon. Still waiting for my rack mounts.

ciprianveg · May 14, 2026, 9:00pm

mimo 2.5 pro and glm5.1

elsaco · May 15, 2026, 12:27am

@ma.bu before you launch any model you might want to space those Sparks. Under load the mighty Spark gets very hot. There have been many overheat shutdown cases reported already. Search the forum.

Or get back to us when your Sparks die on you unexpectedly!

mashie · May 15, 2026, 1:55pm

Could you run latency tests between two of the nodes in that setup?

Node 1:

ib_write_lat -d rocep1s0f0 -i 1 -p 13000 -F

Node 2:

ib_write_lat -d rocep1s0f0 -i 1 -p 13000 -F <Node-1_IP_Address>

Node 1:

ib_read_lat -d rocep1s0f0 -i 1 -p 13001 -F

Node 2:

ib_read_lat -d rocep1s0f0 -i 1 -p 13001 -F <Node-1_IP_Address>

raphael.amorim · May 15, 2026, 2:25pm

Cool, one thing is contributing with benchmarks here: https://spark-arena.com
We don’t have benchmarks for Deepseek v4 and nvidia/Kimi-K2.6-NVFP4 yet

raphael.amorim · May 15, 2026, 7:21pm

https://x.com/spark_arena/status/2055367735463538717?s=20 @ma.bu

Alexlocal · May 15, 2026, 10:58pm

can you run NVIDIA-Nemotron-3-Super-120B-A12B-BF16 with 1m tokens?) something like this

Summary

docker exec -it $VLLM_CONTAINER /bin/bash -c "\
CUDA_VISIBLE_DEVICES=0
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
vllm serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
--tensor-parallel-size 8 \
--max-model-len 1048576 \
--dtype bfloat16 \
--distributed-executor-backend ray \
--enforce-eager \
--enable-auto-tool-choice \
--tool-call-parser mistral \
--host 0.0.0.0 \
--port 8000 \
--swap-space 0 \
--trust-remote-code"

ma.bu · May 16, 2026, 11:32am

We ran the RDMA latency tests between ai-002 (10.0.0.3) and ai-003 (10.0.0.5) on rocep1s0f0.

ib_write_lat: avg 2.49 us, 99% 2.59 us, 99.9% 3.47 us
ib_read_lat: avg 4.86 us, 99% 5.02 us, 99.9% 6.37 us

Both tests completed successfully.

ma.bu · May 16, 2026, 11:33am

Thank you, :) I have no idea why the people think there could be fire soon. It´s so save here in Austria also with my hardware mess.

ma.bu · May 16, 2026, 11:34am

Kimi K2.6 done, Deepseek v4 will follow soon.

ma.bu · May 16, 2026, 11:34am

Actually no issue so far and the spark gets max 90c degree. But yes, they will be spaced soon.

mashie · May 16, 2026, 11:34am

Thank you, perfect.

ma.bu · May 16, 2026, 11:35am

glm 5.1 already on spark arena: zai-org/GLM-5.1-FP8 - Spark Arena Benchmark

mimo 2.5 will follow soon.

Topic		Replies	Views
6x Spark setup DGX Spark / GB10	112	10025	April 25, 2026
DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model DGX Spark / GB10 nim , llama	34	2495	May 1, 2026
Dual DGX Spark RoCE Bandwidth Expectations DGX Spark / GB10	20	890	May 14, 2026
I have ordered a second unit. Don't know why my friends say I'm stupid DGX Spark / GB10	47	3122	May 25, 2026
Kimi 2.6 and Qwen 3.5-397B -FP8 on 8xGB10 cluster DGX Spark / GB10	28	1663	May 29, 2026
Qwen3.5-397B-A17B-int4-AutoRound - 4 x db10 node - updated results 37 - 94 tok/s DGX Spark / GB10 clustering , spark	26	1922	April 28, 2026
Dgx spark benchmark performance DGX Spark / GB10	16	2189	December 21, 2025
Devstral-2-123B-NVFP4-TensorRT-LLM on 2x sparks? DGX Spark / GB10	1	559	December 23, 2025
TRT LLM for Inference - two Sparks example is VERY slow DGX Spark / GB10	5	794	October 23, 2025
TensorRT-LLM + nvidia/Llama-3.3-70B-Instruct-NVFP4 = 5 tok/s DGX Spark / GB10 llama	3	679	January 18, 2026

What to run on 8 Sparks / GB10?

Related topics