I’m attempting to install the Mellanox drivers certified to be used with ConnectX-3 card on Redhat 8.6 for a HPC cluster running Bright 9.2 image. The resulting driver does not load into the kernel and I cannot see the infiniband device. The error messages are below. The driver is 4.9-5.1.0
mlx_compat: loading out-of-tree module taints kernel.
mlx_compat: module verification failed: signature and/or required key missing - tainting kernel
Make sure you use kernel version is 4.18.0-372.9.1.el8.x86_64.
And not enable uefi scure boot or try sign kernel module.
I have verified my kernel is: 4.18.0-372.19.1.el8_6.x86_64
I tried the instructions in the link provided, but I’m still not able to get this to work.
The first step in the instructions says to import the public key module, which I have done. I was able to reboot, it asked for the password for the public key, and I provided that. The kernel is still tainted. I then tried to re-install the driver. Still tainted.
I then tried to follow the instruction to strip the signature from the kernel modules, but it’s unable to work because the command given doesn’t return any results.
rpm -qa | grep -E “kernel-ib|mlnx-ofa_kernel|iser|srp|knem|mlnx-rds|mlnx-nfsrdma|mlnx-nvme|mlnx-rdma-rxe” | xargs rpm -ql | grep “.ko$”
Results in nothing returned, therefore the strip command doesn’t have anything to work with.
I’m not sure where I need to go from here. I don’t see how to “resign” the module from these instructions, if that’s an option?
Try disable UEFI SCURE boot from BIOS. Or need re-config kernel to disable scure boot kernel lock down.
Have you check if the driver module eg. mlx5_core.ko load failure?
Usually if you enable scure boot on kernel it will just WARNING tainted. And not prevent load driver, unless enable scure boot combine with kernel lockdown.
I have verified in the BIOS that secure boot is NOT enabled. I also verified this in Linux with /usr/bin/mokutil --sb-state which comes back with SecureBoot disabled.
After installation of the driver, tried running hca_self_test.ofed which comes back with errors
---- Performing Adapter Device Self Test ----
Number of CAs Detected … 1
PCI Device Check … PASS
Host Driver RPM Check … FAIL
REASON: no RPMs found for currently booted kernel 4.18.0-372.19.1.el8_6.x86_64
Kernel Arch … x86_64
Host Driver Version … NA
Firmware Check on CA #0 (VPI) … NA
Host Driver Initialization … NA
Number of CA Ports Active … NA
Error Counter Check … NA
Kernel Syslog Check … NA
Node GUID on CA #0 (VPI) … NA
------------------ DONE ---------------------
lsmod | grep mlx
mlx5_core 1417216 0
mlxfw 24576 1 mlx5_core
tls 102400 1 mlx5_core
mlx4_en 159744 0
mlx4_ib 16384 0
mlx4_core 413696 1 mlx4_en
mlx_compat 16384 4 mlx4_core,mlx4_ib,mlx4_en,mlx5_core
Attempted to re-install the driver with --add-kernel-support which fails
Note: This program will create mlnx-en TGZ for rhel8.6 under /tmp/mlnx-en-4.9-184.108.40.206-4.18.0-372.19.1.el8_6.x86_64 directory.
See log file /tmp/mlnx-en-4.9-220.127.116.11-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685_logs/mlnx_ofed_iso.7685.log
Checking if all needed packages are installed…
Building mlnx-en RPMS . Please wait…
ERROR: Failed executing “MLNX_EN_SRC-4.9-18.104.22.168/install.pl --tmpdir /tmp/mlnx-en-4.9-22.214.171.124-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685_logs --kernel-only --kernel 4.18.0-372.19.1.el8_6.x86_64 --kernel-sources /lib/modules/4.18.0-372.19.1.el8_6.x86_64/build --builddir /tmp/mlnx-en-4.9-126.96.36.199-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685 --disable-kmp --build-only --distro rhel8.6”
ERROR: See /tmp/mlnx-en-4.9-188.8.131.52-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.7685_logs/mlnx_ofed_iso.7685.log
Failed to build mlnx-en for 4.18.0-372.19.1.el8_6.x86_64
Numerous errors in the debug log states that there are attempts to redefine kvmalloc in all sorts of source files causing make to fail.
:21: error: redefinition of ‘kvmalloc’
static inline void *kvmalloc(size_t size, gfp_t flags)
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.