I’m attempting to install the Mellanox drivers certified to be used with ConnectX-3 card on Redhat 8.6 for a HPC cluster running Bright 9.2 image. The resulting driver does not load into the kernel and I cannot see the infiniband device. The error messages are below. The driver is 4.9-5.1.0
I have verified my kernel is: 4.18.0-372.19.1.el8_6.x86_64
I tried the instructions in the link provided, but I’m still not able to get this to work.
The first step in the instructions says to import the public key module, which I have done. I was able to reboot, it asked for the password for the public key, and I provided that. The kernel is still tainted. I then tried to re-install the driver. Still tainted.
I then tried to follow the instruction to strip the signature from the kernel modules, but it’s unable to work because the command given doesn’t return any results.
Have you check if the driver module eg. mlx5_core.ko load failure?
Usually if you enable scure boot on kernel it will just WARNING tainted. And not prevent load driver, unless enable scure boot combine with kernel lockdown.
I have verified in the BIOS that secure boot is NOT enabled. I also verified this in Linux with /usr/bin/mokutil --sb-state which comes back with SecureBoot disabled.
After installation of the driver, tried running hca_self_test.ofed which comes back with errors
---- Performing Adapter Device Self Test ----
Number of CAs Detected … 1
PCI Device Check … PASS
Host Driver RPM Check … FAIL
REASON: no RPMs found for currently booted kernel 4.18.0-372.19.1.el8_6.x86_64
Kernel Arch … x86_64
Host Driver Version … NA
Firmware Check on CA #0 (VPI) … NA
Host Driver Initialization … NA
Number of CA Ports Active … NA
Error Counter Check … NA
Kernel Syslog Check … NA
Node GUID on CA #0 (VPI) … NA
------------------ DONE ---------------------
Attempted to re-install the driver with --add-kernel-support which fails
Note: This program will create mlnx-en TGZ for rhel8.6 under /tmp/mlnx-en-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64 directory.
See log file /tmp/mlnx-en-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685_logs/mlnx_ofed_iso.7685.log
Checking if all needed packages are installed…
Building mlnx-en RPMS . Please wait…
ERROR: Failed executing “MLNX_EN_SRC-4.9-5.1.0.0/install.pl --tmpdir /tmp/mlnx-en-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685_logs --kernel-only --kernel 4.18.0-372.19.1.el8_6.x86_64 --kernel-sources /lib/modules/4.18.0-372.19.1.el8_6.x86_64/build --builddir /tmp/mlnx-en-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685 --disable-kmp --build-only --distro rhel8.6”
ERROR: See /tmp/mlnx-en-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685_logs/mlnx_ofed_iso.7685.log
Failed to build mlnx-en for 4.18.0-372.19.1.el8_6.x86_64
Numerous errors in the debug log states that there are attempts to redefine kvmalloc in all sorts of source files causing make to fail.