When I launch the cluster with the --no-ray option, it progresses a bit but stops.
./launch-cluster.sh \
-t vllm-node-tf5 \
--name vllm_node \
--non-privileged \
--nodes "192.168.200.12,192.168.200.13" \
--eth-if enp1s0f1np1 \
--ib-if rocep1s0f1,roceP2p1s0f1 \
--localhost-port 8888 \
--apply-mod mods/fix-qwen3-coder-next \
--apply-mod mods/fix-qwen3.5-chat-template \
-e VLLM_MARLIN_USE_ATOMIC_ADD=1 \
--no-ray \
exec vllm serve qwen/qwen3.5-35b-a3b-fp8 \
--host 0.0.0.0 \
--port 8888 \
--max-model-len 262144 \
--max_num_batched_tokens 16384 \
--gpu-memory-utilization 0.7 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--kv-cache-dtype fp8 \
--load-format fastsafetensors \
--attention-backend flashinfer \
--enable-prefix-caching \
--chat-template unsloth.jinja \
--tensor-parallel-size 2
Detected Local IP: 192.168.200.12 (192.168.200.12/24)
Head Node: 192.168.200.12
Worker Nodes: 192.168.200.13
Container Name: vllm_node
Image Name: vllm-node-tf5
Action: exec
Checking SSH connectivity to worker nodes...
SSH to 192.168.200.13: OK
Running in non-privileged mode...
Starting Head Node on 192.168.200.12...
4efdf734e9d2d695499330f1d0725b19c4de92cb25abd1e3d111cae6e7c812ce
Starting Worker Node on 192.168.200.13...
d08fe7c566189b1c170d11ca9daa9755feb3090ff8560007372783b8d41c03be
Applying modifications to cluster nodes...
Applying mod 'fix-qwen3-coder-next' to 192.168.200.12...
Copying directory content to container...
Successfully copied 9.73kB to vllm_node:/workspace/mods/fix-qwen3-coder-next/
Running patch script on 192.168.200.12...
Patching Qwen3-Coder-Next crashing on start
patching file vllm/v1/core/single_type_kv_cache_manager.py
Hunk #1 FAILED at 1000.
1 out of 1 hunk FAILED -- saving rejects to file vllm/v1/core/single_type_kv_cache_manager.py.rej
Patch is not applicable, skipping
Reverting PR #34279 that causes slowness
patching file vllm/model_executor/layers/fused_moe/fused_moe.py
Unreversed patch detected! Ignore -R? [n]
Apply anyway? [n]
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file vllm/model_executor/layers/fused_moe/fused_moe.py.rej
Can't revert PR #34279, skipping as it was reverted in recent commits
Fixing Triton allocator bug
Applying mod 'fix-qwen3.5-chat-template' to 192.168.200.12...
Copying directory content to container...
Successfully copied 11.3kB to vllm_node:/workspace/mods/fix-qwen3.5-chat-template/
Running patch script on 192.168.200.12...
=======> to apply chat template, use --chat-template unsloth.jinja
Applying mod 'fix-qwen3-coder-next' to 192.168.200.13...
Copying mod package to 192.168.200.13:/tmp/vllm_mod_pkg_1774026108_31933...
fix_crash.diff 100% 712 1.0MB/s 00:00
fix_slowness.diff 100% 2129 2.4MB/s 00:00
run.sh 100% 959 1.3MB/s 00:00
_triton_alloc_setup.pth 100% 27 47.5KB/s 00:00
_triton_alloc_setup.py 100% 257 350.5KB/s 00:00
Copying directory content to container...
Running patch script on 192.168.200.13...
Patching Qwen3-Coder-Next crashing on start
patching file vllm/v1/core/single_type_kv_cache_manager.py
Hunk #1 FAILED at 1000.
1 out of 1 hunk FAILED -- saving rejects to file vllm/v1/core/single_type_kv_cache_manager.py.rej
Patch is not applicable, skipping
Reverting PR #34279 that causes slowness
patching file vllm/model_executor/layers/fused_moe/fused_moe.py
Unreversed patch detected! Ignore -R? [n]
Apply anyway? [n]
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file vllm/model_executor/layers/fused_moe/fused_moe.py.rej
Can't revert PR #34279, skipping as it was reverted in recent commits
Fixing Triton allocator bug
Applying mod 'fix-qwen3.5-chat-template' to 192.168.200.13...
Copying mod package to 192.168.200.13:/tmp/vllm_mod_pkg_1774026110_2731...
chat_template.jinja 100% 7817 8.3MB/s 00:00
run.sh 100% 144 245.4KB/s 00:00
Copying directory content to container...
Running patch script on 192.168.200.13...
=======> to apply chat template, use --chat-template unsloth.jinja
Executing command: vllm serve qwen/qwen3.5-35b-a3b-fp8 --host 0.0.0.0 --port 8888 --max-model-len 262144 --max_num_batched_tokens 16384 --gpu-memory-utilization 0.7 --enable-auto-tool-choice --tool-call-parser qwen3_coder --kv-cache-dtype fp8 --load-format fastsafetensors --attention-backend flashinfer --enable-prefix-caching --chat-template unsloth.jinja --tensor-parallel-size 2
Launching worker (rank 1) on 192.168.200.13...
Executing command on head node (rank 0): vllm serve qwen/qwen3.5-35b-a3b-fp8 --host 0.0.0.0 --port 8888 --max-model-len 262144 --max_num_batched_tokens 16384 --gpu-memory-utilization 0.7 --enable-auto-tool-choice --tool-call-parser qwen3_coder --kv-cache-dtype fp8 --load-format fastsafetensors --attention-backend flashinfer --enable-prefix-caching --chat-template unsloth.jinja --tensor-parallel-size 2 --nnodes 2 --node-rank 0 --master-addr 192.168.200.12 --master-port 29501
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:297]
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:297] █ █ █▄ ▄█
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:297] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.2rc1.dev7+g9c7cab5eb.d20260317
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:297] █▄█▀ █ █ █ █ model qwen/qwen3.5-35b-a3b-fp8
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:297] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:297]
(APIServer pid=246) INFO 03-20 17:02:02 [utils.py:233] non-default args: {'model_tag': 'qwen/qwen3.5-35b-a3b-fp8', 'chat_template': 'unsloth.jinja', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'port': 8888, 'model': 'qwen/qwen3.5-35b-a3b-fp8', 'max_model_len': 262144, 'load_format': 'fastsafetensors', 'attention_backend': 'flashinfer', 'master_addr': '192.168.200.12', 'nnodes': 2, 'tensor_parallel_size': 2, 'gpu_memory_utilization': 0.7, 'kv_cache_dtype': 'fp8', 'enable_prefix_caching': True, 'max_num_batched_tokens': 16384}
(APIServer pid=246) WARNING 03-20 17:02:02 [envs.py:1724] Unknown vLLM environment variable detected: VLLM_BASE_DIR
(APIServer pid=246) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_section', 'mrope_interleaved'}
(APIServer pid=246) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_section', 'mrope_interleaved'}
(APIServer pid=246) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
(APIServer pid=246) INFO 03-20 17:02:12 [model.py:533] Resolved architecture: Qwen3_5MoeForConditionalGeneration
(APIServer pid=246) INFO 03-20 17:02:12 [model.py:1582] Using max model len 262144
(APIServer pid=246) INFO 03-20 17:02:12 [cache.py:212] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
(APIServer pid=246) INFO 03-20 17:02:13 [arg_utils.py:1659] Inferred data_parallel_rank 0 from node_rank 0
(APIServer pid=246) INFO 03-20 17:02:13 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=16384.
(APIServer pid=246) WARNING 03-20 17:02:13 [config.py:372] Mamba cache mode is set to 'align' for Qwen3_5MoeForConditionalGeneration by default when prefix caching is enabled
(APIServer pid=246) INFO 03-20 17:02:13 [config.py:392] Warning: Prefix caching in Mamba cache 'align' mode is currently enabled. Its support for Mamba layers is experimental. Please report any issues you may observe.
(APIServer pid=246) INFO 03-20 17:02:13 [config.py:212] Setting attention block size to 2096 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=246) INFO 03-20 17:02:13 [vllm.py:754] Asynchronous scheduling is enabled.
(APIServer pid=246) INFO 03-20 17:02:13 [compilation.py:289] Enabled custom fusions: norm_quant, act_quant
(EngineCore pid=453) INFO 03-20 17:02:54 [core.py:103] Initializing a V1 LLM engine (v0.17.2rc1.dev7+g9c7cab5eb.d20260317) with config: model='qwen/qwen3.5-35b-a3b-fp8', speculative_config=None, tokenizer='qwen/qwen3.5-35b-a3b-fp8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=fastsafetensors, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=True, quantization=fp8, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=qwen/qwen3.5-35b-a3b-fp8, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['+quant_fp8', 'none', '+quant_fp8'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_endpoints': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': True, 'fuse_act_quant': True, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
(EngineCore pid=453) WARNING 03-20 17:02:54 [multiproc_executor.py:997] Reducing Torch parallelism from 20 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(EngineCore pid=453) INFO 03-20 17:02:54 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=192.168.200.12, mq_connect_ip=192.168.200.12 (local), world_size=2, local_world_size=1
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] EngineCore failed to start.
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] Traceback (most recent call last):
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] super().__init__(
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] self.model_executor = executor_class(vllm_config)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in __init__
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] super().__init__(vllm_config)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] return func(*args, **kwargs)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] self._init_executor()
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 145, in _init_executor
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] self.rpc_broadcast_mq = MessageQueue(
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 422, in __init__
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] self.remote_socket.bind(socket_addr)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/zmq/sugar/socket.py", line 320, in bind
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] super().bind(addr)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "zmq/backend/cython/_zmq.py", line 1009, in zmq.backend.cython._zmq.Socket.bind
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] _check_rc(rc)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] File "zmq/backend/cython/_zmq.py", line 190, in zmq.backend.cython._zmq._check_rc
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] raise ZMQError(errno)
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] ^^^^^^^^^^^
(EngineCore pid=453) ERROR 03-20 17:02:54 [core.py:1099] zmq.error.ZMQError: Cannot assign requested address (addr='tcp://192.168.200.12:57087')
(EngineCore pid=453) Process EngineCore:
(EngineCore pid=453) Traceback (most recent call last):
(EngineCore pid=453) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=453) self.run()
(EngineCore pid=453) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=453) self._target(*self._args, **self._kwargs)
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core
(EngineCore pid=453) raise e
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=453) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=453) return func(*args, **kwargs)
(EngineCore pid=453) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=453) super().__init__(
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=453) self.model_executor = executor_class(vllm_config)
(EngineCore pid=453) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in __init__
(EngineCore pid=453) super().__init__(vllm_config)
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=453) return func(*args, **kwargs)
(EngineCore pid=453) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=453) self._init_executor()
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 145, in _init_executor
(EngineCore pid=453) self.rpc_broadcast_mq = MessageQueue(
(EngineCore pid=453) ^^^^^^^^^^^^^
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 422, in __init__
(EngineCore pid=453) self.remote_socket.bind(socket_addr)
(EngineCore pid=453) File "/usr/local/lib/python3.12/dist-packages/zmq/sugar/socket.py", line 320, in bind
(EngineCore pid=453) super().bind(addr)
(EngineCore pid=453) File "zmq/backend/cython/_zmq.py", line 1009, in zmq.backend.cython._zmq.Socket.bind
(EngineCore pid=453) _check_rc(rc)
(EngineCore pid=453) ^^^^^^^^^^^
(EngineCore pid=453) File "zmq/backend/cython/_zmq.py", line 190, in zmq.backend.cython._zmq._check_rc
(EngineCore pid=453) raise ZMQError(errno)
(EngineCore pid=453) ^^^^^^^^^^^
(EngineCore pid=453) zmq.error.ZMQError: Cannot assign requested address (addr='tcp://192.168.200.12:57087')
(APIServer pid=246) Traceback (most recent call last):
(APIServer pid=246) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=246) sys.exit(main())
(APIServer pid=246) ^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=246) args.dispatch_function(args)
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd
(APIServer pid=246) uvloop.run(run_server(args))
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=246) return __asyncio.run(
(APIServer pid=246) ^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=246) return runner.run(main)
(APIServer pid=246) ^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=246) return self._loop.run_until_complete(task)
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=246) return await main
(APIServer pid=246) ^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server
(APIServer pid=246) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker
(APIServer pid=246) async with build_async_engine_client(
(APIServer pid=246) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=246) return await anext(self.gen)
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client
(APIServer pid=246) async with build_async_engine_client_from_engine_args(
(APIServer pid=246) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=246) return await anext(self.gen)
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args
(APIServer pid=246) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=246) return cls(
(APIServer pid=246) ^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=246) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=246) return func(*args, **kwargs)
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client
(APIServer pid=246) return AsyncMPClient(*client_args)
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=246) return func(*args, **kwargs)
(APIServer pid=246) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in __init__
(APIServer pid=246) super().__init__(
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in __init__
(APIServer pid=246) with launch_core_engines(
(APIServer pid=246) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=246) next(self.gen)
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines
(APIServer pid=246) wait_for_engine_startup(
(APIServer pid=246) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup
(APIServer pid=246) raise RuntimeError(
(APIServer pid=246) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Stopping cluster...
Stopping head node (192.168.200.12)...
Stopping worker node (192.168.200.13)...
Cluster stopped.
I’ll try to troubleshoot this by manually launching ray and attempting to get ray to communicate between two docker containers with usernamespace remapping enabled first, before trying to run vLLM.