Has anyone managed to get the example Install and Use vLLM for Inference | DGX Spark
to work on dual sparks?
I have followed the tutorial but when I try to get the 70B model to load it says there is not enough memory. It seems as if it can’t see the worker Spark here is the output of ray list nodes. 10.168.1.128 is the IP address of the head node’s 10 GB port.
======== List: 2025-11-14 11:31:17.340771 ========
Stats:
Total: 2
Table:
NODE_ID NODE_IP IS_HEAD_NODE STATE STATE_MESSAGE NODE_NAME RESOURCES_TOTAL LABELS
0 43eb5919fef04013b0f0eff0e04f31781c31073cf70aa693b0d29c15 10.168.1.128 True ALIVE 10.168.1.128 CPU: 20.0 ray.io/accelerator-type: GB10
GPU: 1.0 ray.io/node-id: 43eb5919fef04013b0f0eff0e04f31781c31073cf70aa693b0d29c15
accelerator_type:GB10: 1.0
memory: 108.718 GiB
node:10.168.1.128: 1.0
node:internal_head: 1.0
object_store_memory: 9.728 GiB
1 5a299ad3678429ac998ca9c1d7ea58527dd8e57ffc877fbd7d286e0d 192.168.100.11 False ALIVE 192.168.100.11 CPU: 20.0 ray.io/accelerator-type: GB10
GPU: 1.0 ray.io/node-id: 5a299ad3678429ac998ca9c1d7ea58527dd8e57ffc877fbd7d286e0d
accelerator_type:GB10: 1.0
memory: 109.849 GiB
node:192.168.100.11: 1.0
object_store_memory: 9.728 GiB
I have managed to get the 8B model to work but obviously that fits on a single device.
I am not sure why it has the head node as 10.168.1.128 rather than 192.168.100.10