Updated RHEL7 kernel. Now openibd.service fails to start. Do I have to re-load the driver again? Is there a way to just update and not completely reinstall? Or is there some other issue?

uname -r

3.10.0-1127.8.2.el7.x86_64

ofed_info -n

5.0-2.1.8.0

systemctl status openibd.service

● openibd.service - openibd - configure Mellanox devices

Loaded: loaded (/usr/lib/systemd/system/openibd.service; enabled; vendor preset: disabled)

Active: failed (Result: exit-code) since Wed 2020-05-20 08:58:32 EDT; 51min ago

Docs: file:/etc/infiniband/openib.conf

Process: 1681 ExecStart=/etc/init.d/openibd start bootid=%b (code=exited, status=2)

Main PID: 1681 (code=exited, status=2)

May 20 08:58:29 localhost.localdomain systemd[1]: Starting openibd - configure Mellanox devices…

May 20 08:58:31 hpcnode2 openibd[1681]: Module mlx4_en belong to kernel which is not a part of MLNX_OFED, skipping…[FAILED]

May 20 08:58:31 hpcnode2 openibd[1681]: Module mlx5_core belong to kernel which is not a part of MLNX_OFED, skipping…[FAILED]

May 20 08:58:31 hpcnode2 openibd[1681]: Module mlx5_ib belong to kernel which is not a part of MLNX_OFED, skipping…[FAILED]

May 20 08:58:31 hpcnode2 openibd[1681]: Module mlx5_fpga_tools does not exist, skipping…[FAILED]

May 20 08:58:32 hpcnode2 openibd[1681]: Loading HCA driver and Access Layer:[ OK ]

May 20 08:58:32 hpcnode2 systemd[1]: openibd.service: main process exited, code=exited, status=2/INVALIDARGUMENT

May 20 08:58:32 hpcnode2 systemd[1]: Failed to start openibd - configure Mellanox devices.

May 20 08:58:32 hpcnode2 systemd[1]: Unit openibd.service entered failed state.

May 20 08:58:32 hpcnode2 systemd[1]: openibd.service failed.

Hello Jeffery,

Thank you for posting your inquiry on the Mellanox Community.

Yes, you will need to run the ./mlnxofedinstall script with the --add-kernel-support option in order to rebuild the modules for your new kernel version.

The OFED install package consists of several source RPMs. The script rebuilds the source RPMs, and installs the newly created binary RPMs. The kernel module binaries are built, and placed in /lib/modules//updates/kernel/drivers/net/. Therefore, the modules compiled and installed with your old kernel will not be compatible with your new kernel - hence the ‘Module X belong to kernel which is not a part of MLNX_OFED’ output you’re seeing.

Best regards,

Mellanox Technical Support

Hi all,

We are also seeing this on the new 7.8 kernel. We opened a case for this. From documentation (https://docs.mellanox.com/m/view-rendered-page.action?abstractPageId=25146673) ‘On Redhat and SLES distributions with errata kernel installed there is no need to use the mlnx_add_kernel_support.sh script. The regular installation can be performed and weak-updates mechanism will create symbolic links to the MLNX_OFED kernel modules.’

This always worked for us on earlier releases/kernels. It seems there is a bug in the weak-update part of the install.

Thanks,

Kenneth