Hello,
My installation involves a ConnectX-4 Lx EN card with MOFED 4.7 on CentOS 7.7 (kernel-3.10.0-1062.1.2.el7.x86_64). Updating the firmware and restarting openibd are successful, but starting the subnet manager fails (a screenshot is also attached):
sudo /etc/init.d/opensmd start
Starting opensmd (via systemctl): Job for opensmd.service failed because the control process exited with error code. See “systemctl status opensmd.service” and “journalctl -xe” for details.
[FAILED]
[root@ MLNX_OFED_LINUX-4.7-1.0.0.1-rhel7.7-x86_64]# systemctl status opensmd.service
? opensmd.service - LSB: Activates/Deactivates InfiniBand Subnet Manager
Loaded: loaded (/etc/rc.d/init.d/opensmd; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2019-10-07 17:10:03 UTC; 32s ago
Docs: man:systemd-sysv-generator(8)
Process: 200714 ExecStart=/etc/rc.d/init.d/opensmd start (code=exited, status=1/FAILURE)
Oct 07 17:09:56 systemd[1]: Starting LSB: Activates/Deactivates InfiniBand Subnet Manager…
Oct 07 17:09:56 OpenSM[200722]: /var/log/opensm.log log file opened
Oct 07 17:09:56 OpenSM[200722]: OpenSM 5.5.0.MLNX20190923.1c78385
Oct 07 17:10:03 opensmd[200714]: Starting IB Subnet Manager…[FAILED]
Oct 07 17:10:03 systemd[1]: opensmd.service: control process exited, code=exited status=1
Oct 07 17:10:03 systemd[1]: Failed to start LSB: Activates/Deactivates InfiniBand Subnet Manager.
Oct 07 17:10:03 systemd[1]: Unit opensmd.service entered failed state.
Oct 07 17:10:03 systemd[1]: opensmd.service failed.
I don’t understand if something is missing or why it’s failing and if this would affect the performance of CUDA/MPI jobs, which is the ultimate goal here.
Thanks,
Arturo