Failed to run tp or pp on two-nodes ray cluster using docker vllm:25.11

terryzhv83 · November 27, 2025, 1:45am

two DGX spark connected by qsfp56 cable,

docker pull from nvidia, vllm 25.11

failed to run vllm serve on either tp=2 or pp=2, logs attached.

raphael.amorim · December 1, 2025, 3:23pm

It looks like nccl problem. I would just ignore the NVIDIA instructions on this for now. They’re not functional. Please go through these threads and save yourself some time. Lots of discussions already happened and lots of other insights you don’t need to find out by yourself:

This repo has a simple, but functional setup for a vLLM cluster with Ray support:

Topic		Replies	Views
How do I run vLLM inference on a DGX Spark system using two ConnectX-7 NICs? DGX Spark / GB10	9	1651	December 22, 2025
vLLM for Inference with 2 sparks example - WARNING 06-28 14:18:56 [ray_utils.py:556] Tensor parallel size (2) exceeds available GPUs (1) DGX Spark / GB10 notebooks , spark , llama , dgx-spark-issue	2	180	July 4, 2026
vLLM on dual sparks DGX Spark / GB10	3	952	December 1, 2025
NCCL all-reduce deadlock on dual DGX Spark after successful channel establishment — affects both vLLM and TRT-LLM DGX Spark / GB10 nemotron	21	791	April 17, 2026
DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model DGX Spark / GB10 nim , llama	34	2739	May 1, 2026
With two Sparks, vLLM 0.18.1rc0 still hammering two cores at 100% when idle DGX Spark / GB10	7	312	March 28, 2026
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	5804	December 9, 2025
Two-Spark Cluster: tensor-parallel-size=2 causing Engine Initialization Failure with Qwen3-VL-30B (Ray + vLLM) DGX Spark / GB10 cluster-management	4	298	March 11, 2026
Vllm on spark cluster starts and loads model but API not running? DGX Spark / GB10	9	947	December 1, 2025
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	2759	December 25, 2025

Failed to run tp or pp on two-nodes ray cluster using docker vllm:25.11

Related topics