My installation involves a ConnectX-4 Lx EN card with MOFED 4.7 on CentOS 7.7 (kernel-3.10.0-1062.1.2.el7.x86_64). Updating the firmware and restarting openibd are successful, but starting the subnet manager fails (a screenshot is also attached):
sudo /etc/init.d/opensmd start
Starting opensmd (via systemctl): Job for opensmd.service failed because the control process exited with error code. See “systemctl status opensmd.service” and “journalctl -xe” for details.
[root@ MLNX_OFED_LINUX-4.7-184.108.40.206-rhel7.7-x86_64]# systemctl status opensmd.service
? opensmd.service - LSB: Activates/Deactivates InfiniBand Subnet Manager
Loaded: loaded (/etc/rc.d/init.d/opensmd; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2019-10-07 17:10:03 UTC; 32s ago
Process: 200714 ExecStart=/etc/rc.d/init.d/opensmd start (code=exited, status=1/FAILURE)
Oct 07 17:09:56 systemd: Starting LSB: Activates/Deactivates InfiniBand Subnet Manager…
Oct 07 17:09:56 OpenSM: /var/log/opensm.log log file opened
Oct 07 17:09:56 OpenSM: OpenSM 5.5.0.MLNX20190923.1c78385
Oct 07 17:10:03 opensmd: Starting IB Subnet Manager…[FAILED]
Oct 07 17:10:03 systemd: opensmd.service: control process exited, code=exited status=1
Oct 07 17:10:03 systemd: Failed to start LSB: Activates/Deactivates InfiniBand Subnet Manager.
Oct 07 17:10:03 systemd: Unit opensmd.service entered failed state.
Oct 07 17:10:03 systemd: opensmd.service failed.
I don’t understand if something is missing or why it’s failing and if this would affect the performance of CUDA/MPI jobs, which is the ultimate goal here.