Three GB10 Ray Cluster and Inference

Our Lab followed the official instructions on connecting two sparks and inferencing with vLLM. We then wanted to verify whether the setup could be extended to three sparks.

Luckily, we managed to use three GB10 devices as a cluster for running inference. The Ray cluster started correctly, and the ray status output is shown in the figure below [3 gpu/ 328G memory ]:

Here are some key points that you might want to know:

1.Utilize three QSFP cables, and connect 3 machines in a way like a triangle. A ↔ B, B ↔ C, C ↔ A.

image

2.Configure multiple subnets within your LAN. In our setup, the three machines communicated across a total of three subnets.

3.Make sure to assign the correct IP addresses to the Ray head node and worker nodes when initializing the cluster.

Feel free to leave a message if you encounter issues while building a triangular Ray cluster.

Yes— in theory, more machines can be daisy-chained through the network topology, but once you go beyond three, a switch would be required.

I have 4 Sparks in a daisychain - no switch needed out on port 1, in on port2. The 200Gbs networking is very reliable. The 10Gbs LAN is very good - I did have to provide switches for that to communicate with my Database Server and Windows Desktop. Found the WiFi unreliable, kept dropping out.

Hi there,

May I ask what kind of workloads you’re using your 4-Spark setup for, and how well it’s performing in practice?

It’s not often that you see four Sparks in one setup, so I’m genuinely curious. If you’re running inference (e.g., with vLLM/Ray), do you have any screenshots or numbers for throughput (tokens/sec), latency, and/or GPU utilization?

Thanks in advance.

Hi Siertum,

Thanks for making contact.

I am primarily working on the development of fine training. The Sparks are obviously slower than cloud machines but can run four parallel workloads 24/ 7. I set up the cluster to run larger models, but have not yet had the opportunity to turn my attention to them.

I am also testing LLM inference as part of the training process.

Unfortunately, I am not able to disclose specific details. I have been genuinely pleased with Spark. I am not a sophisticated Linux user so I take a fairly centre of the road approach.

Throughput varies, but for some tasks, I am amazed. For instance, reading, updating and saving a JSON file with 10,000 large text fields is instantaneous. Broadly, when full fine training (not LoRA) with 10,000 text-heavy samples at FP16 with Llama 3 8B, including reams of console output and database access, will take 8 or so hours. Inference is variable – again, I can quote very text and reasoning-heavy inference runs of 3.5 minutes for 10 samples, these have a context window of up to 2000 tokens. Doing comparative work on an H200 in the cloud at Lambda is, I guess, about 4 times faster than Spark.

I hope this helps you.

Kind regards

Tim

Hi Tim,
Yes, I believe you can do that. However, model loading and inferencing could be much slower without a switch.

We made a video about this.In case you are interested: https://www.youtube.com/watch?v=IF6Z2B5TR7k

Some other problem we ran into, hopefully these would save you from some trouble:
-When we were trying to run Qwen3-235B-A22B-FP4 on double-stacked Sparks, it failed. NV vllm image 2511 version seems to not support FP4 MoE model very Well.
-For triple-stack, you might need to do some IP routing configuration.

Hi! Thanks for sharing your 3-Spark triangle setup - super helpful.
Could you please share the exact link (or part number) for the QSFP cables you used? I’m a bit worried I’ll order the wrong type/variant (e.g., QSFP56 vs QSFP28, DAC vs AOC, length, etc.).
A direct product link would be perfect. Thanks!

Please check this thread: 6x Spark setup - #47 by raphael.amorim

You should be able to run Qwen3-235B-A22B-FP4 very easily. So, if you’re having problems running it by yourself, just use @eugr GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks. It works consistently and he’s constantly updating as improvements are done in the vLLM community.

Hi Siertum!
Product Name: QSFP112 CABLE 400G 0.4M ETHERNET
You could search by product name, this is the product link I found on google just now:
https://www.neobits.com/asus_gx10_qsfp_cable_asus_cb_gx10_qsfp_cable_ascent_p28655391.html?srsltid=AfmBOopE3K7lqecYyfzh_uINdxUfXdA_ENz7u3o3rXTH6TZG2pCGeU61