two DGX spark connected by qsfp56 cable,
docker pull from nvidia, vllm 25.11
failed to run vllm serve on either tp=2 or pp=2, logs attached.
failed-on-vllm-25.11.log (43.6 KB)
two DGX spark connected by qsfp56 cable,
docker pull from nvidia, vllm 25.11
failed to run vllm serve on either tp=2 or pp=2, logs attached.
failed-on-vllm-25.11.log (43.6 KB)
It looks like nccl problem. I would just ignore the NVIDIA instructions on this for now. They’re not functional. Please go through these threads and save yourself some time. Lots of discussions already happened and lots of other insights you don’t need to find out by yourself:
This repo has a simple, but functional setup for a vLLM cluster with Ray support: