I would like to enable communication between two servers using RDMA (RoCEv2). Both servers have MLNX_OFED_LINUX installed, and we are using Mellanox ConnectX-6 Dx devices.
However, on one of the servers, when I try to start opensmd.service, it doesn’t become active and stays in the “activating” state.
The situation doesn’t change even when I use the start or restart commands. (The other server is showing as active.)
$ sudo systemctl status opensmd.service
● opensmd.service - OpenSM
Loaded: loaded (/lib/systemd/system/opensmd.service; disabled; vendor preset: enabled)
Active: activating (auto-restart) since Fri 2025-02-28 08:44:01 UTC; 4s ago
Process: 3066 ExecStart=/usr/sbin/opensm (code=exited, status=0/SUCCESS)
Main PID: 3066 (code=exited, status=0/SUCCESS)
After checking the logs for opensmd.service using the following command, it shows that the service is repeatedly stopping and restarting. Specifically, the message “No local ports detected!” appears.
$ sudo journalctl -u opensmd.service
I have installed the following version, and Mellanox drivers such as mlx5_core are successfully loaded:
Driver: MLNX_OFED_LINUX-24.10-1.1.4.0-ubuntu20.04-x86_64.iso
OS: Ubuntu 20.04
$ lsmod | grep mlx
mlx5_ib 417792 0
ib_uverbs 147456 2 rdma_ucm,mlx5_ib
ib_core 335872 8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx5_core 1867776 1 mlx5_ib
auxiliary 16384 2 mlx5_ib,mlx5_core
pci_hyperv_intf 16384 1 mlx5_core
mlxdevm 172032 1 mlx5_core
tls 73728 1 mlx5_core
mlxfw 32768 1 mlx5_core
psample 20480 1 mlx5_core
mlx_compat 65536 13 rdma_cm,ib_ipoib,mlxdevm,mlxfw,iw_cm,auxiliary,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
Additionally, I am unable to use the ibstat command. The following error appears.
However, ibv_devinfo and ibv_devices work fine.
This is the same issue on the other server, and both servers show the error below:
$ ibstat
ibpanic: [3830] main: stat of IB device ‘mlx5_0’ failed: No such file or directory
Could you please help me with a solution?