Good morning @eugr Iāve been doing some testing with GPT-OSS-120b and your build of VLLM works perfectly for single node for me. Huge thank you on this!!
I almost hate to ask, but if you have the time, I would be very appreciative to get your insight on the multi-node error Iām seeing. Basically, when running an inference test in a 2 node cluster, it looks like VLLM starts processing ⦠I see the GPU activity on the master spike to 100%; additionally on the worker GPU activity spikes to 100% as well but only for a short period. Essentially, after the GPU activity goes down to zero percent on my worker, the master GPU remains at 100% until about five minutes ⦠I see this in the vllm logs (notice generation throughput goes to 0):
(APIServer pid=1024) INFO 01-30 12:44:54 [loggers.py:257] Engine 000: Avg prompt throughput: 15.8 tokens/s, Avg generation throughput: 23.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1024) INFO 01-30 12:45:04 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 7.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1024) INFO 01-30 12:45:14 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
After about 5 min the vllm process crashes with this trace:
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [dump_input.py:72] Dumping input data for V1 LLM engine (v0.15.0rc2.dev80+ga5aa4d5c0.d20260129) with config: model='/models/gpt-oss-120b', speculative_config=None, tokenizer='/models/gpt-oss-120b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=fastsafetensors, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=mxfp4, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='openai_gptoss', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/models/gpt-oss-120b, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512, 528, 544, 560, 576, 592, 608, 624, 640, 656, 672, 688, 704, 720, 736, 752, 768, 784, 800, 816, 832, 848, 864, 880, 896, 912, 928, 944, 960, 976, 992, 1008, 1024], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False, 'fuse_act_padding': False}, 'max_cudagraph_capture_size': 1024, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'static_all_moe_layers': []},
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [dump_input.py:79] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[], scheduled_cached_reqs=CachedRequestData(req_ids=['chatcmpl-b1786ea3f5e06d3c-b237d30b'],resumed_req_ids=set(),new_token_ids_lens=[],all_token_ids_lens={},new_block_ids=[None],num_computed_tokens=[466],num_output_tokens=[309]), num_scheduled_tokens={chatcmpl-b1786ea3f5e06d3c-b237d30b: 1}, total_num_scheduled_tokens=1, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0, 30], finished_req_ids=[], free_encoder_mm_hashes=[], preempted_req_ids=[], has_structured_output_requests=false, pending_structured_output_tokens=false, num_invalid_spec_tokens=null, kv_connector_metadata=null, ec_connector_metadata=null)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.0002396225023961751, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={}, cudagraph_stats=None, perf_stats=None)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] Traceback (most recent call last):
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/dag/compiled_dag_node.py", line 2525, in _execute_until
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] result = self._dag_output_fetcher.read(timeout)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/common.py", line 312, in read
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] outputs = self._read_list(timeout)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/common.py", line 403, in _read_list
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] raise e
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/common.py", line 385, in _read_list
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] result = c.read(min(remaining_timeout, iteration_timeout))
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/shared_memory_channel.py", line 776, in read
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] return self._channel_dict[self._resolve_actor_id()].read(timeout)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/shared_memory_channel.py", line 612, in read
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] output = self._buffers[self._next_read_index].read(timeout)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/shared_memory_channel.py", line 480, in read
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ret = self._worker.get_objects(
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 976, in get_objects
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ] = self.core_worker.get_objects(
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "python/ray/_raylet.pyx", line 2875, in ray._raylet.CoreWorker.get_objects
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "python/ray/includes/common.pxi", line 124, in ray._raylet.check_status
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read. ObjectID: 00d6d592397561444345a99b5f2ce2efa7534890010000000be1f505
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948]
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948]
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] Traceback (most recent call last):
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 939, in run_engine_core
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] engine_core.run_busy_loop()
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 966, in run_busy_loop
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] self._process_engine_step()
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 999, in _process_engine_step
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 389, in step
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] model_output = self.model_executor.sample_tokens(grammar_output)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/ray_executor.py", line 431, in sample_tokens
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] return self._execute_dag(scheduler_output, grammar_output, non_block)
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/ray_executor.py", line 449, in _execute_dag
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] return refs[0].get()
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ^^^^^^^^^^^^^
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/experimental/compiled_dag_ref.py", line 115, in get
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] self._dag._execute_until(
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] File "/usr/local/lib/python3.12/dist-packages/ray/dag/compiled_dag_node.py", line 2535, in _execute_until
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] raise RayChannelTimeoutError(
(EngineCore_DP0 pid=1084) ERROR 01-30 12:49:56 [core.py:948] ray.exceptions.RayChannelTimeoutError: System error: If the execution is expected to take a long time, increase RAY_CGRAPH_get_timeout which is currently 300 seconds. Otherwise, this may indicate that the execution is hanging.
(EngineCore_DP0 pid=1084) INFO 01-30 12:49:56 [ray_executor.py:120] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
(EngineCore_DP0 pid=1084) 2026-01-30 12:49:56,207 INFO compiled_dag_node.py:2167 -- Tearing down compiled DAG
(EngineCore_DP0 pid=1084) 2026-01-30 12:49:56,208 INFO compiled_dag_node.py:2172 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 4345a99b5f2ce2efa753489001000000)
(EngineCore_DP0 pid=1084) 2026-01-30 12:49:56,208 INFO compiled_dag_node.py:2172 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, d5c932163e63f041a849d9e001000000)
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] AsyncLLM output_handler failed.
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] Traceback (most recent call last):
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 649, in output_handler
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] outputs = await engine_core.get_output_async()
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 894, in get_output_async
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] raise self._format_exception(outputs) from None
(APIServer pid=1024) ERROR 01-30 12:49:56 [async_llm.py:693] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=1024) INFO: 127.0.0.1:43390 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(EngineCore_DP0 pid=1084) 2026-01-30 12:49:56,211 INFO compiled_dag_node.py:2194 -- Waiting for worker tasks to exit
(APIServer pid=1024) INFO: Shutting down
Iām using this command to start the vllm cluster:
./launch-cluster.sh \
--name eugr-vllm-cluster \
-t harbor.k8s.wm.k8slab/dgx/eugr-vllm:latest \
exec \
vllm serve \
/models/gpt-oss-120b \
--port=8000 \
--host=0.0.0.0 \
--gpu-memory-utilization=0.7 \
-tp 2 \
--distributed-executor-backend ray \
--load-format fastsafetensors
On startup, everything looks okay to me:
Auto-detecting interfaces...
Detected IB_IF: rocep1s0f0,roceP2p1s0f0
Detected ETH_IF: enp1s0f0np0
Detected Local IP: 192.168.100.10 (192.168.100.10/31)
Auto-detecting nodes...
Scanning for SSH peers on 192.168.100.10/31...
Found peer: 192.168.100.11
Cluster Nodes: 192.168.100.10,192.168.100.11
Head Node: 192.168.100.10
Worker Nodes: 192.168.100.11
Container Name: eugr-vllm-cluster
Image Name: harbor.k8s.wm.k8slab/dgx/eugr-vllm:latest
Action: exec
Checking SSH connectivity to worker nodes...
SSH to 192.168.100.11: OK
Starting Head Node on 192.168.100.10...
3ce2e35970e368294e6865b051ed4b6cb9764918407cf8d8b50f28eb82f87a2b
Starting Worker Node on 192.168.100.11...
725b948d49d4bcf53c66397d03c002b2e44ac6b7f7d855ad7ac0e1816098302f
I do see a startup error regarding the Triton kernel, but I feel like this isnāt really related to my problem because please shout back if my intuition is wrong:
ERROR 01-30 12:28:32 [gpt_oss_triton_kernels_moe.py:34] Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton_kernels.routing'
Regarding vllm/ray, I do see this ⦠Iāve never run a multi-node configuration before, so to be honest, Iām not sure if this is expected behavior or indicating an issue. Like I said above, the model loads correctly on both nodes:
(EngineCore_DP0 pid=1083) 2026-01-30 12:28:32,449 INFO worker.py:1821 -- Connecting to existing Ray cluster at address: 192.168.100.10:6379...
(EngineCore_DP0 pid=1083) 2026-01-30 12:28:32,457 INFO worker.py:1998 -- Connected to Ray cluster. View the dashboard at http://192.168.100.10:8265
(EngineCore_DP0 pid=1083) /usr/local/lib/python3.12/dist-packages/ray/_private/worker.py:2046: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
(EngineCore_DP0 pid=1083) warnings.warn(
(EngineCore_DP0 pid=1083) INFO 01-30 12:28:32 [ray_utils.py:402] No current placement group found. Creating a new placement group.
(EngineCore_DP0 pid=1083) WARNING 01-30 12:28:32 [ray_utils.py:213] tensor_parallel_size=2 is bigger than a reserved number of GPUs (1 GPUs) in a node fc107f9265520cec8ddfda10b91a8e1d174d9d3e43a38a9f6bd64b24. Tensor parallel workers can be spread out to 2+ nodes which can degrade the performance unless you have fast interconnect across nodes, like Infiniband. To resolve this issue, make sure you have more than 2 GPUs available at each node.
(EngineCore_DP0 pid=1083) WARNING 01-30 12:28:32 [ray_utils.py:213] tensor_parallel_size=2 is bigger than a reserved number of GPUs (1 GPUs) in a node 34518905dfd0291243ce95d4b2a9c01d855a413700a8a57d2caafc8a. Tensor parallel workers can be spread out to 2+ nodes which can degrade the performance unless you have fast interconnect across nodes, like Infiniband. To resolve this issue, make sure you have more than 2 GPUs available at each node.
Part of me feels like this is related to my QSPF setup ⦠I did followed the quick start guide exactly with the exception of trying to put each RoCE link in its own subnet (in case it was a routing issue). Here is my current net config:
Master
# QSPF
rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)
# IPs
lo UNKNOWN 127.0.0.1/8 ::1/128
enP7s7 UP 10.0.1.118/24 fe80::a076:3ac2:4aaf:2728/64
enp1s0f0np0 UP 192.168.100.10/31
enp1s0f1np1 DOWN
enP2p1s0f0np0 UP 192.168.101.0/31
enP2p1s0f1np1 DOWN
wlP9s9 UP 10.0.1.105/24 fe80::8b8b:f189:233e:1a4a/64
docker0 UP 172.17.0.1/16 fe80::7c37:b9ff:fe69:51b3/64
veth147f954@if2 UP fe80::4c68:48ff:fe4e:2c77/64
# Routes
default via 10.0.1.1 dev enP7s7 proto dhcp src 10.0.1.118 metric 100
default via 10.0.1.1 dev wlP9s9 proto dhcp src 10.0.1.105 metric 600
10.0.1.0/24 dev enP7s7 proto kernel scope link src 10.0.1.118 metric 100
10.0.1.0/24 dev wlP9s9 proto kernel scope link src 10.0.1.105 metric 600
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.100.10/31 dev enp1s0f0np0 proto kernel scope link src 192.168.100.10
192.168.101.0/31 dev enP2p1s0f0np0 proto kernel scope link src 192.168.101.0
Worker
# QSPF
rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)
# IPs
lo UNKNOWN 127.0.0.1/8 ::1/128
enP7s7 UP 10.0.1.188/24 fe80::3844:a617:8197:a32d/64
enp1s0f0np0 UP 192.168.100.11/31
enp1s0f1np1 DOWN
enP2p1s0f0np0 UP 192.168.101.1/31
enP2p1s0f1np1 DOWN
wlP9s9 UP 10.0.1.150/24 fe80::c776:4132:665d:e059/64
docker0 UP 172.17.0.1/16 fe80::24b7:83ff:fe82:e147/64
vethb385fc3@if2 UP fe80::44c2:4fff:febb:ad2e/64
# Routes
default via 10.0.1.1 dev enP7s7 proto dhcp src 10.0.1.188 metric 100
default via 10.0.1.1 dev wlP9s9 proto dhcp src 10.0.1.150 metric 600
10.0.1.0/24 dev enP7s7 proto kernel scope link src 10.0.1.188 metric 100
10.0.1.0/24 dev wlP9s9 proto kernel scope link src 10.0.1.150 metric 600
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.100.10/31 dev enp1s0f0np0 proto kernel scope link src 192.168.100.11
192.168.101.0/31 dev enP2p1s0f0np0 proto kernel scope link src 192.168.101.1
My apologies in advance for so much information here. Again, I appreciate the help and insight, as well as the container that youāve created. Like I said, thatās already helped me out quite a bit and saved me a bunch of time. So ty again!