Three GB10 Ray Cluster and Inference

Donald_and_Gao_at_zhiding · December 30, 2025, 6:57am

Our Lab followed the official instructions on connecting two sparks and inferencing with vLLM. We then wanted to verify whether the setup could be extended to three sparks.

Luckily, we managed to use three GB10 devices as a cluster for running inference. The Ray cluster started correctly, and the ray status output is shown in the figure below [3 gpu/ 328G memory ]:

Here are some key points that you might want to know:

1.Utilize three QSFP cables, and connect 3 machines in a way like a triangle. A ↔ B, B ↔ C, C ↔ A.

2.Configure multiple subnets within your LAN. In our setup, the three machines communicated across a total of three subnets.

3.Make sure to assign the correct IP addresses to the Ray head node and worker nodes when initializing the cluster.

Feel free to leave a message if you encounter issues while building a triangular Ray cluster.

gao.shubao · December 30, 2025, 7:22am

Yes— in theory, more machines can be daisy-chained through the network topology, but once you go beyond three, a switch would be required.

tflhayes · January 5, 2026, 10:00pm

I have 4 Sparks in a daisychain - no switch needed out on port 1, in on port2. The 200Gbs networking is very reliable. The 10Gbs LAN is very good - I did have to provide switches for that to communicate with my Database Server and Windows Desktop. Found the WiFi unreliable, kept dropping out.

siertum · January 6, 2026, 6:41pm

Hi there,

May I ask what kind of workloads you’re using your 4-Spark setup for, and how well it’s performing in practice?

It’s not often that you see four Sparks in one setup, so I’m genuinely curious. If you’re running inference (e.g., with vLLM/Ray), do you have any screenshots or numbers for throughput (tokens/sec), latency, and/or GPU utilization?

Thanks in advance.

tflhayes · January 6, 2026, 8:49pm

Hi Siertum,

Thanks for making contact.

I am primarily working on the development of fine training. The Sparks are obviously slower than cloud machines but can run four parallel workloads 24/ 7. I set up the cluster to run larger models, but have not yet had the opportunity to turn my attention to them.

I am also testing LLM inference as part of the training process.

Unfortunately, I am not able to disclose specific details. I have been genuinely pleased with Spark. I am not a sophisticated Linux user so I take a fairly centre of the road approach.

Throughput varies, but for some tasks, I am amazed. For instance, reading, updating and saving a JSON file with 10,000 large text fields is instantaneous. Broadly, when full fine training (not LoRA) with 10,000 text-heavy samples at FP16 with Llama 3 8B, including reams of console output and database access, will take 8 or so hours. Inference is variable – again, I can quote very text and reasoning-heavy inference runs of 3.5 minutes for 10 samples, these have a context window of up to 2000 tokens. Doing comparative work on an H200 in the cloud at Lambda is, I guess, about 4 times faster than Spark.

I hope this helps you.

Kind regards

Tim

Donald_and_Gao_at_zhiding · January 19, 2026, 6:55am

Hi Tim,
Yes, I believe you can do that. However, model loading and inferencing could be much slower without a switch.

Donald_and_Gao_at_zhiding · January 19, 2026, 7:24am

We made a video about this.In case you are interested: https://www.youtube.com/watch?v=IF6Z2B5TR7k

Some other problem we ran into, hopefully these would save you from some trouble:
-When we were trying to run Qwen3-235B-A22B-FP4 on double-stacked Sparks, it failed. NV vllm image 2511 version seems to not support FP4 MoE model very Well.
-For triple-stack, you might need to do some IP routing configuration.

siertum · January 19, 2026, 2:55pm

Hi! Thanks for sharing your 3-Spark triangle setup - super helpful.
Could you please share the exact link (or part number) for the QSFP cables you used? I’m a bit worried I’ll order the wrong type/variant (e.g., QSFP56 vs QSFP28, DAC vs AOC, length, etc.).
A direct product link would be perfect. Thanks!

raphael.amorim · January 19, 2026, 5:36pm

Please check this thread: 6x Spark setup - #47 by raphael.amorim

raphael.amorim · January 19, 2026, 5:41pm

You should be able to run Qwen3-235B-A22B-FP4 very easily. So, if you’re having problems running it by yourself, just use @eugr GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks. It works consistently and he’s constantly updating as improvements are done in the vLLM community.

Donald_and_Gao_at_zhiding · January 21, 2026, 7:12am

Hi Siertum!
Product Name: QSFP112 CABLE 400G 0.4M ETHERNET
You could search by product name, this is the product link I found on google just now:
https://www.neobits.com/asus_gx10_qsfp_cable_asus_cb_gx10_qsfp_cable_ascent_p28655391.html?srsltid=AfmBOopE3K7lqecYyfzh_uINdxUfXdA_ENz7u3o3rXTH6TZG2pCGeU61

Topic		Replies	Views
Three node Spark clusters (without a switch) are now supported in spark-vllm-docker and sparkrun! DGX Spark / GB10 llama	12	1503	May 4, 2026
vLLM on dual sparks DGX Spark / GB10	3	852	December 1, 2025
DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model DGX Spark / GB10 nim , llama	34	2454	May 1, 2026
How do I run vLLM inference on a DGX Spark system using two ConnectX-7 NICs? DGX Spark / GB10	9	1480	December 22, 2025
Advise on Spark cluster DGX Spark / GB10	10	834	March 5, 2026
6x Spark setup DGX Spark / GB10	112	9877	April 25, 2026
Should we as a community gofundme one Spark for Eugr's nightly builds? DGX Spark / GB10	51	1648	April 1, 2026
Spark-Cluster general setup DGX Spark / GB10 clustering	13	1090	January 31, 2026
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	5491	December 9, 2025
My Dual Sparks setup plan DGX Spark / GB10 agentic-ai , nemoclaw , openclaw	6	659	April 8, 2026

Three GB10 Ray Cluster and Inference

Related topics