MLNX_OFED installation failing for rdma-core

When i try to install MLNX_OFED(MLNX_OFED_LINUX-24.01-0.3.3.1-ubuntu22.04-x86_64), it is failing when installing rdma-core. Below is the error message.

Setting up rdma-core (2307mlnx47-1.2401033) …
Removing obsolete conffile /etc/udev/rules.d/70-persistent-ipoib.rules …
Job for iwpmd.service failed because the service did not take the steps required by its unit configuration.
See “systemctl status iwpmd.service” and “journalctl -xeu iwpmd.service” for details.
invoke-rc.d: initscript iwpmd, action “restart” failed.
× iwpmd.service - LSB: iWarp Port Mapper Daemon
Loaded: loaded (/etc/init.d/iwpmd; generated)
Active: failed (Result: protocol) since Sun 2024-02-11 17:29:44 UTC; 5ms ago
Docs: man:systemd-sysv-generator(8)
Process: 11959 ExecStart=/etc/init.d/iwpmd start (code=exited, status=5)
CPU: 13ms

Feb 11 17:29:44 maven systemd[1]: Starting LSB: iWarp Port Mapper Daemon…
Feb 11 17:29:44 maven iwpmd[11965]: Couldn’t find /usr/sbin/iwpmd
Feb 11 17:29:44 maven systemd[1]: iwpmd.service: Can’t open PID file /run/iwpmd.pid (yet?) after start: Operation not permitted
Feb 11 17:29:44 maven systemd[1]: iwpmd.service: Failed with result ‘protocol’.
Feb 11 17:29:44 maven systemd[1]: Failed to start LSB: iWarp Port Mapper Daemon.
dpkg: error processing package rdma-core (–configure):
installed rdma-core package post-installation script subprocess returned error exit status 1

Operating system used – Ubuntu 22.04
Linux version – 6.5.0-17-generic
Architecture – x86

With default apt-get repo, i am able to install rdma-core. But with MLNX_OFED package, i am facing this issue.

The other issue - We purchased ConnectX-5 VPI adapter card (MCX556A-ECA_Ax). As it is VPI card, it should support both Infiniband and Ethernet. My network card is working fine with Infiniband but when I change it to Ethernet, the physical link status is always disabled. I have attached screenshot for your reference. Please help me in this matter.

Hello

It seems that the iwpmd service does not exist on your system.
Two things you can try:

  1. Try to download the latest LTS (23.10-1.1.9.0-LTS) version from: Linux InfiniBand Drivers
    And install it - see if that is installed successfully.
  2. If the above fails, I would try to install iwpmd manually with apt-get, and then run the MLNX_OFED installation again.

Regarding the second question - this could be related to cables perhaps.
Maybe the cable connected to the device port is an IB cable, therefore does not activate the link for Ethernet.
If MST is installed on the system, you can run:
mlxlink -d /dev/mst/xxxxx_pciconf0
To see the link information. Under “speed” you should see if the link is IB or ETH.
You can add the “-m” flag to get information on the specific module connected, including it’s Part Number, which you can search for online to get more details.

I hope this helps.
If you still have questions, please open a case at: enterprisesupport@nvidia.com, and it will be handled according to entitlement.

Best Regards,
Jonathan

I have run into this same bug trying to install doca-ofed on Debian 12. It appears that /usr/sbin/iwpmd is missing in doca-ofed and in MLNX_OFED_LINUX-24.04-0.6.6.0-debian12.1-x86_64. In Debian 12, /usr/sbin/iwpmd is included in the rdma-core package. But the rdma-core package in doca-ofed and MLNX_OFED_LINUX-24.04-0.6.6.0-debian12.1-x86_64 does not contain /usr/sbin/iwpmd. This is a bug. Because the postinst script for rdma-core in doca-ofed and MLNX_OFED_LINUX-24.04-0.6.6.0-debian12.1-x86_64 tries to run it. The file MLNX_OFED_LINUX-24.04-0.6.6.0-debian12.1-x86_64/src/MLNX_OFED_SRC-24.04-0.6.6.0/SOURCES/CMakeLists.txt contains

rdma_sbin_executable(iwpmd
iwarp_pm_common.c
iwarp_pm_helper.c
iwarp_pm_server.c
)

and

rdma_subst_install(FILES “iwpmd_init.in”
DESTINATION “${CMAKE_INSTALL_INITDDIR}”
RENAME “iwpmd”
PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ OWNER_EXECUTE GROUP_EXECUTE WORLD_EXECUTE)

but for some reason it is not included in

DBES/rdma-core_2404mlnx51-1.2404066_amd64.deb