Adding node performance

paul.aviles · December 14, 2025, 7:18am

Hello there, I got a DGX and considering adding a second unit but wondering what the performance improvement could be.

I am sure nothing here is standard varies on model etc, etc, but if for instance using gpt-oss:20b and getting say 10 tokens per second, will adding a second unit lower that to 6 sec perhaps? (not expecting a linear gain actually)

Any performance gains using 2 cables vs 1 among 2 units ?

Regards,

Paul

raphael.amorim · December 14, 2025, 3:13pm

There is definitely relevant improvement. Lots of benchmarks across the forum. I suggest you start by reading:

paul.aviles · December 14, 2025, 8:47pm

Hmm.. thanks for taking the time to reply. I think you just convinced me not to buy the second unit and return the first one :-). I still have a few more days to decide.

Thanks again!

Paul

alan.dang · December 14, 2025, 10:18pm

If you are just running 20b and don’t need any of the other features of the DGX Spark, then the RTX5090 is the way to go.

That said, I did find gpt-oss-20b with maximum thinking on the 5090 to be slower than gpt-oss-120b with minimal thinking on the DGX Spark. Even though the tokens/sec were way faster on the RTX5090, the model spent a lot more tokens thinking when you put it in high thinking mode.

It’s not consistent if 20b/high beats or loses to 120b/low for accuracy — it would depend on the nature of the questions you are asking.

I think we all want a RTX Pro 6000 MaxQ with 128GB for $4000 but alas, that’s why the 96GB RTX Pro 6000 is more expensive than two DGX Sparks

eugr · December 15, 2025, 2:49am

With properly setup dual Spark cluster you can expect almost 2x performance gain for dense models (slow ones) and less gain for sparse ones. Prompt processing performance scales better than inference.

Here is a compilation of my results - some of these need retesting as I was running with old config, but you can get an idea. That’s using VLLM and two Sparks connected via a single QSFP112 cable.

Model name	Cluster (t/s)	Single (t/s)	Comment
Qwen/Qwen3-VL-32B-Instruct-FP8	12.00	7.00
cpatonn/Qwen3-VL-32B-Instruct-AWQ-4bit	21.00	12.00
GPT-OSS-120B	55.00	36.00	SGLang gives 75/53
RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4	21.00	N/A
QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ	26.00	N/A	old setup, needs retest
QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ	97.00	82.00
RedHatAI/Qwen3-30B-A3B-NVFP4	75.00	64.00
QuantTrio/MiniMax-M2-AWQ	41.00	N/A
QuantTrio/GLM-4.6-AWQ	17.00	N/A
zai-org/GLM-4.6V-FP8	24.00	N/A

paul.aviles · December 15, 2025, 2:06pm

Awesome, thanks for sharing! Very interesting.

Topic		Replies	Views
I have ordered a second unit. Don't know why my friends say I'm stupid DGX Spark / GB10	47	3517	May 25, 2026
Stacked dgx spark advantages? DGX Spark / GB10	3	1099	November 16, 2025
Spark-Cluster general setup DGX Spark / GB10 clustering	13	1399	January 31, 2026
Value of 2nd Spark? DGX Spark / GB10 Projects	21	2917	March 30, 2026
Dual DGX Spark RoCE Bandwidth Expectations DGX Spark / GB10	20	1088	May 14, 2026
Advise on Spark cluster DGX Spark / GB10	10	1025	March 5, 2026
How many DGX Spark/GB10 devices do you have? DGX Spark / GB10	42	2060	June 21, 2026
Going from 1 -> 2 sparks DGX Spark / GB10 agentic-ai	13	849	June 19, 2026
6x Spark setup DGX Spark / GB10	112	11281	April 25, 2026
2 node spark vs 3 or 4 node spark DGX Spark / GB10 Projects spark , llm , agentic-ai , dgx	18	1389	June 29, 2026

Adding node performance

Related topics