Install and Use vLLM for Inference on two Sparks does not work

I used the playbook to “Install and Use vLLM for Inference” but i am not able to get it to work on two Sparks. I can run vLLM on the main Spark node, but as soon as I attempt to run across the NCCL it fails. I keep getting messages that my version of Pytorch and Triton does not support SM 12.1.

I am using the docker container specified in the playbook and I have also tried the latest one too. Both fail the same.

The NCCL is working fine.

I have set the gpu memory usage down as it seems like it is not recognizing all of the memory across both Sparks?

Here is the log I am getting when just trying to run a small model across the two Sparks:

docker exec -it node vllm serve ibm-granite/granite-4.0-h-350m --tensor-parallel-size 2 --max_model_len 2048 --gpu-memory-utilization 0.4

(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 92, in init
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py”, line 85, in determine_available_memory
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self.collective_rpc(“determine_available_memory”)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 312, in collective_rpc
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py”, line 505, in _run_workers
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py”, line 22, in auto_init_wrapper
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return fn(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py”, line 104, in wrapper
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 2882, in get
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 968, in get_objects
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise value.as_instanceof_cause()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ray.exceptions.RayTaskError: ray::RayWorkerWrapper.execute_method() (pid=504, ip=192.168.6.64, actor_id=3929d272544a9428368520e502000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0xec8811f83f20>)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 276, in execute_method
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise e
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 267, in execute_method
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 263, in determine_available_memory
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self.model_runner.profile_run()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3392, in profile_run
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] = self._dummy_run(self.max_num_tokens, is_profile=True)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3152, in _dummy_run
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] outputs = self.model(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py”, line 121, in call
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1784, in _call_impl
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/granitemoehybrid.py”, line 599, in forward
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py”, line 310, in call
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py”, line 749, in compile_wrapper
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 923, in _compile_fx_inner
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise InductorError(e, currentframe()).with_traceback(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 907, in _compile_fx_inner
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mb_compiled_graph = fx_codegen_and_compile(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1578, in fx_codegen_and_compile
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1456, in codegen_and_compile
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] compiled_module = graph.compile_to_module()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2293, in compile_to_module
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self._compile_to_module()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2303, in _compile_to_module
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mod = self._compile_to_module_lines(wrapper_code)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2371, in _compile_to_module_lines
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mod = PyCodeCache.load_by_key_path(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/codecache.py”, line 3296, in load_by_key_path
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/compile_tasks.py”, line 31, in _reload_python_module
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] exec(code, mod.dict, mod.dict)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/root/.cache/vllm/torch_compile_cache/040a8cdd5f/rank_0_0/inductor_cache/gh/cghg6cq6mdxkky2ahfca5bglg5bb6yk4xotw4fm5r4xfjmmkhuj7.py”, line 67, in
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton(‘triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0’, ‘’’
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/async_compile.py”, line 404, in triton
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] kernel.precompile(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 408, in precompile
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self._precompile_worker()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 434, in _precompile_worker
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise NoTritonConfigsError(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] torch._inductor.exc.InductorError: NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ptxas stderr:
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ptxas fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708]
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] Repro command: /usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_121a /tmp/tmpo34r9kip.ptx -o /tmp/tmpo34r9kip.ptx.o
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708]
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708]
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you’re reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS=“+dynamo”
(EngineCore_DP0 pid=1727) Process EngineCore_DP0:
(EngineCore_DP0 pid=1727) Traceback (most recent call last):
(EngineCore_DP0 pid=1727) File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_DP0 pid=1727) self.run()
(EngineCore_DP0 pid=1727) File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
(EngineCore_DP0 pid=1727) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 712, in run_engine_core
(EngineCore_DP0 pid=1727) raise e
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=1727) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=1727) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 92, in init
(EngineCore_DP0 pid=1727) self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=1727) self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py”, line 85, in determine_available_memory
(EngineCore_DP0 pid=1727) return self.collective_rpc(“determine_available_memory”)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 312, in collective_rpc
(EngineCore_DP0 pid=1727) return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py”, line 505, in _run_workers
(EngineCore_DP0 pid=1727) ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py”, line 22, in auto_init_wrapper
(EngineCore_DP0 pid=1727) return fn(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py”, line 104, in wrapper
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 2882, in get
(EngineCore_DP0 pid=1727) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 968, in get_objects
(EngineCore_DP0 pid=1727) raise value.as_instanceof_cause()
(EngineCore_DP0 pid=1727) ray.exceptions.RayTaskError: ray::RayWorkerWrapper.execute_method() (pid=504, ip=192.168.6.64, actor_id=3929d272544a9428368520e502000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0xec8811f83f20>)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 276, in execute_method
(EngineCore_DP0 pid=1727) raise e
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 267, in execute_method
(EngineCore_DP0 pid=1727) return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 263, in determine_available_memory
(EngineCore_DP0 pid=1727) self.model_runner.profile_run()
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3392, in profile_run
(EngineCore_DP0 pid=1727) = self._dummy_run(self.max_num_tokens, is_profile=True)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3152, in _dummy_run
(EngineCore_DP0 pid=1727) outputs = self.model(
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py”, line 121, in call
(EngineCore_DP0 pid=1727) return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=1727) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1784, in _call_impl
(EngineCore_DP0 pid=1727) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/granitemoehybrid.py”, line 599, in forward
(EngineCore_DP0 pid=1727) hidden_states = self.model(input_ids, positions, intermediate_tensors,
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py”, line 310, in call
(EngineCore_DP0 pid=1727) output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py”, line 749, in compile_wrapper
(EngineCore_DP0 pid=1727) raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 923, in _compile_fx_inner
(EngineCore_DP0 pid=1727) raise InductorError(e, currentframe()).with_traceback(
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 907, in _compile_fx_inner
(EngineCore_DP0 pid=1727) mb_compiled_graph = fx_codegen_and_compile(
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1578, in fx_codegen_and_compile
(EngineCore_DP0 pid=1727) return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1456, in codegen_and_compile
(EngineCore_DP0 pid=1727) compiled_module = graph.compile_to_module()
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2293, in compile_to_module
(EngineCore_DP0 pid=1727) return self._compile_to_module()
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2303, in _compile_to_module
(EngineCore_DP0 pid=1727) mod = self._compile_to_module_lines(wrapper_code)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2371, in _compile_to_module_lines
(EngineCore_DP0 pid=1727) mod = PyCodeCache.load_by_key_path(
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/codecache.py”, line 3296, in load_by_key_path
(EngineCore_DP0 pid=1727) mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/compile_tasks.py”, line 31, in _reload_python_module
(EngineCore_DP0 pid=1727) exec(code, mod.dict, mod.dict)
(EngineCore_DP0 pid=1727) File “/root/.cache/vllm/torch_compile_cache/040a8cdd5f/rank_0_0/inductor_cache/gh/cghg6cq6mdxkky2ahfca5bglg5bb6yk4xotw4fm5r4xfjmmkhuj7.py”, line 67, in
(EngineCore_DP0 pid=1727) triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton(‘triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0’, ‘’’
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/async_compile.py”, line 404, in triton
(EngineCore_DP0 pid=1727) kernel.precompile(
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 408, in precompile
(EngineCore_DP0 pid=1727) self._precompile_worker()
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 434, in _precompile_worker
(EngineCore_DP0 pid=1727) raise NoTritonConfigsError(
(EngineCore_DP0 pid=1727) torch._inductor.exc.InductorError: NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(EngineCore_DP0 pid=1727) ptxas stderr:
(EngineCore_DP0 pid=1727) ptxas fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’
(EngineCore_DP0 pid=1727)
(EngineCore_DP0 pid=1727) Repro command: /usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_121a /tmp/tmpo34r9kip.ptx -o /tmp/tmpo34r9kip.ptx.o
(EngineCore_DP0 pid=1727)
(EngineCore_DP0 pid=1727)

Does this occur with just this model or with others? For testing purposes we recommend Llama 3.3 70B model

It was with all models that try to run across the NCCL. I ended up needing to re-build a few of the components before it would work.

Can you provide some guidance on what/how you rebuilt? Each of my 2 Spark set ups has 2 valid IPs - I’m using the QSFP interface IP and IF name as described in the playbook, but ray list nodes –detail shows the regular network (non-QSFP interface) IP for the head/main node. Also, I have tested NCCL comms and they are working properly.

Any help is very much appreciated. Thanks!

Is your goal to get vLLM running on the two sparks to be able to do inference across the two? Or, are you just trying to get the QSFP working between the two. I have mine running docker containers on the head and worker and working with QSFP using vLLM to share models.

Yes. Using Docker containers with vLLM and Ray. …just like the playbook - except actually working. :-)

I just uploaded what we created that works on our two Sparks. Try it and let me know if the docs and scripts make sense or need to be revised. GitHub - mark-ramsey-ri/vllm-dgx-spark: Guide for running vLLM on two DGX Spark units

3 Likes

this is awesome, mark440. great contribution to the community.

1 Like

Thank you!

I’m getting closer - showing the correct IB adapter IP address now for both nodes in the ray cluster. So, the ray cluster is up, but II hit the next snag.

I didn’t see a vllm serve anywhere. Maybe I missed it or it’s not needed - I still have a lot to learn. The test_vllm_cluster fails connecting to port 8000 (inference). Then I shelled into the head node container and ran `vllm serve meta-llama/Llama-3.3-70B-Instruct --tensor-parallel-size 2 --max_model_len 2048` but ran into an NCCL error initializing - EngineCore failed to start problem.

Full log attached
ibstat attached

vllm-serv-out.txt (61.1 KB)

ibstat-out.txt (2.1 KB)

Can you run some benchmarks? I’m specifically interested in:

  1. Running gpt-oss-120b in tensor parallel.
  2. Running qwen3-next-80b (fp8) in tensor parallel.
  3. Running qwen3-235b in AWQ 4-bit across two nodes.

I suggest you stop and rm the docker containers on each spark. then run the start_head_vllm.sh on the head node and post the output. That script has the vllm serve in it. then run the start_worker_vllm.sh on the worker and post the output. Make sure you stop and rm all docker containers that are on both spark units to free up memory and also kill all the extra processes running on both to free up memory before you run the scripts.

Yes, we are planning to run a few benchmarks and can post the results. It will be early next week as we are wrapping up a non-Spark project the rest of this week :-)

1 Like

Head Node:
bash start_head_vllm.sh
[2025-11-05 18:39:28] Starting DGX Spark vLLM Head Node Setup
[2025-11-05 18:39:28] Configuration:
[2025-11-05 18:39:28] Image: nvcr.io/nvidia/vllm:25.10-py3
[2025-11-05 18:39:28] Head IP: 192.168.100.10
[2025-11-05 18:39:28] Model: meta-llama/Llama-3.3-70B-Instruct
[2025-11-05 18:39:28] Tensor Parallel: 2
[2025-11-05 18:39:28] Ray Version: 2.51.0
[2025-11-05 18:39:28] HF Auth: ✅ Token provided
[2025-11-05 18:39:28]
[2025-11-05 18:39:28] Step 1/8: Verifying HuggingFace cache directory
[2025-11-05 18:39:28] Cache directory exists: /home/labuser/.cache/huggingface
[2025-11-05 18:39:28] Step 2/8: Pulling Docker image
25.10-py3: Pulling from nvidia/vllm
Digest: sha256:a7dcc96460541c2a132434f31002d23c9991eb8bf64f8c302fc40b5a4bda0ef9
Status: Image is up to date for nvcr.io/nvidia/vllm:25.10-py3
[2025-11-05 18:39:29] Step 3/8: Cleaning old container
[2025-11-05 18:39:29] Step 4/8: Starting head container
79a53372f43405379d3078b0abf98be829a8f3ac2ab5d7588117f31dd0307802
[2025-11-05 18:39:30] Container started successfully
[2025-11-05 18:39:30] Step 5/8: Installing Ray 2.51.0
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.14.0 documentation. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
[2025-11-05 18:39:38] Ray 2.51.0 installed
[2025-11-05 18:39:38] Step 6/8: Starting Ray head
[2025-11-05 18:39:44] Ray head started, waiting for readiness…
[2025-11-05 18:39:45] ✅ Ray head is ready (1s)
[2025-11-05 18:39:45] Step 7/8: Pre-downloading model
[2025-11-05 18:39:45] This may take a while for large models…
[2025-11-05 18:39:46] Model download complete (or already cached)
[2025-11-05 18:39:46] Step 8/8: Starting vLLM server
labuser@spark-fda4:$

—–
Worker Node:
[2025-11-05 18:40:01] Starting DGX Spark vLLM Worker Setup
[2025-11-05 18:40:01] Configuration:
[2025-11-05 18:40:01] Image: nvcr.io/nvidia/vllm:25.10-py3
[2025-11-05 18:40:01] Worker Name: ray-worker-spark-053c
[2025-11-05 18:40:01] Head IP: 192.168.100.10
[2025-11-05 18:40:01] Ray Version: 2.51.0
[2025-11-05 18:40:01]
[2025-11-05 18:40:01] Step 1/7: Verifying HuggingFace cache directory
[2025-11-05 18:40:01] Cache directory exists: /home/labuser/.cache/huggingface
[2025-11-05 18:40:01] Step 2/7: Testing connectivity to head
[2025-11-05 18:40:01] ✅ Head is reachable
[2025-11-05 18:40:01] Step 3/7: Pulling Docker image
25.10-py3: Pulling from nvidia/vllm
Digest: sha256:a7dcc96460541c2a132434f31002d23c9991eb8bf64f8c302fc40b5a4bda0ef9
Status: Image is up to date for nvcr.io/nvidia/vllm:25.10-py3
[2025-11-05 18:40:02] Step 4/7: Cleaning old container
[2025-11-05 18:40:02] Step 5/7: Starting worker container
3503bd4fb4d6647a84d1686ab3880f1eed65aa46e2184e8702e2fdac7f5a4bf9
[2025-11-05 18:40:03] Container started successfully
[2025-11-05 18:40:03] Step 6/7: Installing Ray 2.51.0
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.14.0 documentation. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
[2025-11-05 18:40:12] Ray 2.51.0 installed
[2025-11-05 18:40:12] Step 7/7: Joining Ray cluster
[2025-11-05 18:40:15] Worker started, waiting for cluster registration…
[2025-11-05 18:41:28] ⚠️ Worker may not be connected after 30s
[2025-11-05 18:41:28] Check cluster status from head:
[2025-11-05 18:41:28] docker exec ray-head ray status --address=127.0.0.1:6379

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Worker ray-worker-spark-053c is ready!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔍 Verify from head node:
docker exec ray-head ray status --address=127.0.0.1:6379

📊 Expected output should show multiple ‘Healthy’ nodes

🌐 Ray Dashboard: http://192.168.100.10:8265
(Check ‘Cluster’ tab to see all nodes)

⚙️ To increase parallelism, update head vLLM with:
–tensor-parallel-size <num_total_gpus>

🔧 Worker logs:
docker logs -f ray-worker-spark-053c

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Looking forward to it. I find it puzzling that after a wave of initial DGX Spark reviews there is a complete silence on clustered setup. Not a single benchmark in the wild that I could find, only some AI-generated “reviews” from specs.

I would test myself, but I don’t want to spend money on the second device just yet.

If I get the cluster going, I’m open to benchmarking on mine.

a couple of observations. You are running over standard ethernet and NOT the ifiniband connection as it is showing 192.x.x.x addresses instead of 169.x.x.x and the load of the model seems to be causing the script to stop. I have updated the script to handle longer load times better and have added a troubleshooting document to the repo. The doc will show you how to find your infiniband IP address and check the actual status of vllm in docker if you need to.

you need to change your environment variables to not have the 192.x.x.x addresses, stop and rm the docker containers on head and worker, get the new head script and give it another go….you are very close!!

Yes, I manually set the IP address for my IB interfaces - not using netplan (169.254.x.x). That’s why the 192.168.100.x.

The vllm process was not running in the container. So, I exec’ed into the head container and used the script’s form run the command as it would from the script. It crashed loading model weights.

  1. I will grab your updates
  2. look to see if there is a path problem to the huggingface cache folder.

Hy! I have the same problem.
I followed the guide in the repo, got the dashboard running, NCCL connection tests are okay, and I’m using the IB IPs (169.254.x.x). But the head node failed to start the vLLM server.

Output: docker exec ray-head ray status --address=127.0.0.1:6379
======== Autoscaler status: 2025-11-07 08:00:20.347240 ========
Node status

Active:
1 node_12064465f72f7a2b7805fd825e47af8c09c77443c886a3daf93112e7
1 node_31b7a3ebfd7bccbbc0210341efbd700366487afadd56980e40a5ee98
Pending:
(no pending nodes)
Recent failures:
(no failures)

Resources

Total Usage:
0.0/40.0 CPU
0.0/2.0 GPU
0B/207.54GiB memory
0B/30.40GiB object_store_memory

From request_resources:
(none)
Pending Demands:
(no resource demands)

When i try to start vLLM manually:

vLLM startup logs.txt (61.1 KB)

Your issue is a bit different:

The error you’re encountering is an NCCL (NVIDIA Collective Communications Library) communication failure when trying to initialize distributed GPU workers across two nodes. The key issues I can identify from the logs are:

  1. NCCL Internal Error: RuntimeError: NCCL error: internal error - please check again that the NCCL setup is working.

  2. Network Device Mismatch: On BPN-DGX-02 (the second node), NCCL cannot find the InfiniBand devices:

    • BPN-DGX-02:2987:2987 [0] NCCL INFO NET/IB : No device found.
    • It falls back to using Socket communication instead of RDMA
  3. Here are a few commands to gather info on the system:

    docker exec -it ray-head bash -c “ray status”
    docker exec -it ray-head bash -c “nvidia-smi --list-gpus”
    docker exec -it ray-head bash -c “echo $NCCL_SOCKET_IFNAME && ip addr show enp1s0f1np1”
    docker exec ray-head bash -c “ray status”
    docker exec ray-head bash -c “nvidia-smi --list-gpus”
    docker exec ray-head bash -c “env | grep -E ‘(NCCL|IB)’”

    If you run the commands and send the output we can see what is up.

Any luck running with the new script?