mlnx-ofed-kernel-modules installation for additional kernels and re-installation fails on Ubuntu 20.04.3

First installation of MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.iso on Ubuntu server 20.04.03 with GA kernel 5.4.0-91-generic succeeded (log in file: mlnxofedinstall_ok.log).

Install for kernel 5.4.0-40 failed on the same system and then installation for kernel -94 failed again. Finally, reinstall for 5.4.0-91 failed in the exact manner. Now I cannot install driver, and can’t revert kernel with a working driver.

What can be done to mitigate the problem (aside of system reinstall)?

mlnxofedinstall.log (2.89 KB)

MLNX_OFED_LINUX.471488.logs_mlnx-ofed-kernel-modules.debinstall.log (122 KB)

MLNX_OFED_LINUX.471488.logs_general.log (46.6 KB)

mlnxofedinstall_ok.log (5.92 KB)

Hello,

With the MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.iso mounted, we were able to successfully install against Ubuntu 20.04.03 with all three of the mentioned kernels using the install options from your attached text files. Prior to attempting each install, we rebooted the system into the install target kernel and ran the uninstall.sh script contained within the iso image. The results for each:

Installation passed successfully

To load the new driver, run:

/etc/init.d/openibd restart

root@ubuntu20:/mnt# uname -a

Linux ubuntu20 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu20:/mnt# /etc/init.d/openibd restart

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# ofed_info -s

MLNX_OFED_LINUX-5.5-1.0.3.2:

Installation passed successfully

To load the new driver, run:

/etc/init.d/openibd restart

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# uname -a

Linux ubuntu20 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# /etc/init.d/openibd restart

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# ofed_info -s

MLNX_OFED_LINUX-5.5-1.0.3.2:

Installation passed successfully

To load the new driver, run:

/etc/init.d/openibd restart

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# uname -a

Linux ubuntu20 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# /etc/init.d/openibd restart

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# ofed_info -s

MLNX_OFED_LINUX-5.5-1.0.3.2:

When attempting to run the installation script without running the uninstall.sh script against the previous installation first, the installation is similarly successful against the 5.4.0-91-generic kernel:

Installation passed successfully

To load the new driver, run:

/etc/init.d/openibd restart

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# uname -a

Linux ubuntu20 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# /etc/init.d/openibd restart

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

root@ubuntu20:/mnt#

root@ubuntu20:/mnt# ofed_info -s

MLNX_OFED_LINUX-5.5-1.0.3.2:

Please attempt to run the included uninstall.sh script, reboot into the target kernel, and attempt the installation again. For more information on the installation process, please review the MLNX_OFED User Manual:

https://docs.nvidia.com/networking/display/MLNXOFEDv551032/Installation

If you are still running into errors with the installation or need assistance in debugging the issue and have a valid support contract, please open a support case. If you do not have a current/valid support contract, please reach out to the team at Networking-contracts@nvidia.com for assistance in obtaining a contract.

Thank you,

Nvidia Networking Support