vLLM on dual sparks

chbrain · November 14, 2025, 11:33am

Has anyone managed to get the example Install and Use vLLM for Inference | DGX Spark
to work on dual sparks?

I have followed the tutorial but when I try to get the 70B model to load it says there is not enough memory. It seems as if it can’t see the worker Spark here is the output of ray list nodes. 10.168.1.128 is the IP address of the head node’s 10 GB port.

======== List: 2025-11-14 11:31:17.340771 ========
Stats:

Total: 2

Table:

NODE_ID                                                   NODE_IP         IS_HEAD_NODE    STATE    STATE_MESSAGE    NODE_NAME       RESOURCES_TOTAL                 LABELS

0 43eb5919fef04013b0f0eff0e04f31781c31073cf70aa693b0d29c15 10.168.1.128 True ALIVE 10.168.1.128 CPU: 20.0 ray.io/accelerator-type: GB10
GPU: 1.0 ray.io/node-id: 43eb5919fef04013b0f0eff0e04f31781c31073cf70aa693b0d29c15
accelerator_type:GB10: 1.0
memory: 108.718 GiB
node:10.168.1.128: 1.0
node:internal_head: 1.0
object_store_memory: 9.728 GiB
1 5a299ad3678429ac998ca9c1d7ea58527dd8e57ffc877fbd7d286e0d 192.168.100.11 False ALIVE 192.168.100.11 CPU: 20.0 ray.io/accelerator-type: GB10
GPU: 1.0 ray.io/node-id: 5a299ad3678429ac998ca9c1d7ea58527dd8e57ffc877fbd7d286e0d
accelerator_type:GB10: 1.0
memory: 109.849 GiB
node:192.168.100.11: 1.0
object_store_memory: 9.728 GiB

I have managed to get the 8B model to work but obviously that fits on a single device.
I am not sure why it has the head node as 10.168.1.128 rather than 192.168.100.10

chbrain · November 16, 2025, 10:47am

The problem seems to be that the head node was defaulting to the address of the 10 Gb port.

To fix it I kludged the run_cluster.sh file to explicitly specify the ray head node ip address (in my case 192.168.100.10 )

if [ “${NODE_TYPE}” == “–head” ]; then
RAY_START_CMD+=" --head --node-ip-address=192.168.100.10 --port=6379"
else
RAY_START_CMD+=" --address=${HEAD_NODE_ADDRESS}:6379"

I still have to figure out why the worker keeps downloading the weights of the model from huggingface each time it runs.

raphael.amorim · November 16, 2025, 2:49pm

This article was very useful: Connecting Two DGX Spark Systems via 200Gb/s RoCE Network for Multi-Node GPU Training | by Doran Gao | Oct, 2025 | Medium

raphael.amorim · December 1, 2025, 12:25pm

Look into this thread. There’s a ton of information and insights and code about using vLLM and Ray on stacked sparks, including some improvements on loading times:

system · December 15, 2025, 12:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I run vLLM inference on a DGX Spark system using two ConnectX-7 NICs? DGX Spark / GB10	10	918	December 22, 2025
Issue with connection to 2 dgx sparks. vllm DGX Spark / GB10	4	168	November 30, 2025
DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model DGX Spark / GB10 nim , llama	33	1540	January 2, 2026
Vllm on spark cluster starts and loads model but API not running? DGX Spark / GB10	9	639	December 1, 2025
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	4109	December 9, 2025
Failed to run tp or pp on two-nodes ray cluster using docker vllm:25.11 DGX Spark / GB10	2	87	December 1, 2025
Distributed Inference - 200gb/s with bottleneck, am I missing something? DGX Spark / GB10 llama	5	346	January 22, 2026
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	1789	December 25, 2025
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	712	February 13, 2026
Day 1 with DGX Spark (Asus version) DGX Spark / GB10	29	1593	February 7, 2026

vLLM on dual sparks

Table:

Related topics