Adding node performance

Hello there, I got a DGX and considering adding a second unit but wondering what the performance improvement could be.

I am sure nothing here is standard varies on model etc, etc, but if for instance using gpt-oss:20b and getting say 10 tokens per second, will adding a second unit lower that to 6 sec perhaps? (not expecting a linear gain actually)

Any performance gains using 2 cables vs 1 among 2 units ?

Regards,

Paul

There is definitely relevant improvement. Lots of benchmarks across the forum. I suggest you start by reading:

Hmm.. thanks for taking the time to reply. I think you just convinced me not to buy the second unit and return the first one :-). I still have a few more days to decide.

Thanks again!

Paul

If you are just running 20b and don’t need any of the other features of the DGX Spark, then the RTX5090 is the way to go.

That said, I did find gpt-oss-20b with maximum thinking on the 5090 to be slower than gpt-oss-120b with minimal thinking on the DGX Spark. Even though the tokens/sec were way faster on the RTX5090, the model spent a lot more tokens thinking when you put it in high thinking mode.

It’s not consistent if 20b/high beats or loses to 120b/low for accuracy — it would depend on the nature of the questions you are asking.

I think we all want a RTX Pro 6000 MaxQ with 128GB for $4000 but alas, that’s why the 96GB RTX Pro 6000 is more expensive than two DGX Sparks

With properly setup dual Spark cluster you can expect almost 2x performance gain for dense models (slow ones) and less gain for sparse ones. Prompt processing performance scales better than inference.

Here is a compilation of my results - some of these need retesting as I was running with old config, but you can get an idea. That’s using VLLM and two Sparks connected via a single QSFP112 cable.

Model name Cluster (t/s) Single (t/s) Comment
Qwen/Qwen3-VL-32B-Instruct-FP8 12.00 7.00
cpatonn/Qwen3-VL-32B-Instruct-AWQ-4bit 21.00 12.00
GPT-OSS-120B 55.00 36.00 SGLang gives 75/53
RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4 21.00 N/A
QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ 26.00 N/A old setup, needs retest
QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ 97.00 82.00
RedHatAI/Qwen3-30B-A3B-NVFP4 75.00 64.00
QuantTrio/MiniMax-M2-AWQ 41.00 N/A
QuantTrio/GLM-4.6-AWQ 17.00 N/A
zai-org/GLM-4.6V-FP8 24.00 N/A

Awesome, thanks for sharing! Very interesting.