Hi,
I’m trying to install Mellanox OFED 4.9-5.1.0.0 LTS on the latest kernel of RHEL/Centos 8.6 (more strictly, Rocky Linux). I see in the release notes that the last supported version is 4.18.0-372.9.1.el8.x86_64 but my latest update (8.5 → 8.6) seems to have skipped 4.18.0-372.9.1.el8_6.x86_64 entirely in favor of 4.18.0-372.19.1.el8_6.x86_64. Mellanox OFED was not previously installed.
From what I can gather, the problem seems to be in some changes made to the kernel interface that conflict with some (now repeated) definitions in the drivers.
./mlnxofedinstall --distro RHEL8.6 --upstream-libs --add-kernel-support
ERROR: Failed executing "MLNX_OFED_SRC-4.9-5.1.0.0/install.pl --tmpdir /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001_logs --kernel-only --kernel 4.18.0-372.19.1.el8_6.x86_64 --kernel-sources /lib/modules/4.18.0-372.19.1.el8_6.x86_64/build --builddir /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001 --disable-kmp --build-only --distro rhel8.6"
ERROR: See /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001_logs/mlnx_ofed_iso.66001.log
Failed to build MLNX_OFED_LINUX for 4.18.0-372.19.1.el8_6.x86_64
Then in /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001_logs/mlnx_ofed_iso.66001.log
Build ofed-scripts 4.9 RPM
Running rpmbuild --rebuild --define '_topdir /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/OFED_topdir' --define '_sourcedir %{_topdir}/SOURCES' --define '_specdir %{_topdir}/SPECS' --define '_srcrpmdir %{_topdir}/SRPMS' --define '_rpmdir %{_topdir}/RPMS' --define 'dist %{nil}' --target x86_64 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/MLNX_OFED_SRC-4.9-5.1.0.0/SRPMS/ofed-scripts-4.9-OFED.4.9.5.1.0.src.rpm
Build mlnx-ofa_kernel 4.9 RPM
-W- --with-mlx5-ipsec is enabled
Running rpmbuild --rebuild --define '_topdir /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/OFED_topdir' --define '_sourcedir %{_topdir}/SOURCES' --define '_specdir %{_topdir}/SPECS' --define '_srcrpmdir %{_topdir}/SRPMS' --define '_rpmdir %{_topdir}/RPMS' --nodeps --define '_dist .rhel8u6' --define 'configure_options --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mlxfw-mod --with-mlx4-mod --with-mlx4_en-mod --with-mlx5-mod --with-mlx5-ipsec --with-ipoib-mod --with-innova-flex --with-innova-ipsec --with-mdev-mod --with-srp-mod --with-iser-mod --with-isert-mod' --define 'KVERSION 4.18.0-372.19.1.el8_6.x86_64' --define 'K_SRC /lib/modules/4.18.0-372.19.1.el8_6.x86_64/build' --define '_prefix /usr' /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/MLNX_OFED_SRC-4.9-5.1.0.0/SRPMS/mlnx-ofa_kernel-4.9-OFED.4.9.5.1.0.1.src.rpm
ESC[31mFailed to build mlnx-ofa_kernel 4.9 RPM[0m
Collecting debug info...
[31mSee /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001_logs/OFED.66369.logs/mlnx-ofa_kernel-4.9.rpmbuild.log[0m
In the rpmbuild.log some relevant errors
/tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/OFED_topdir/BUILD/mlnx-ofa_kernel-4.9/obj/default/include/linux/mm.h:15:21: error: conflicting types for 'kvzalloc'
15 | static inline void *kvzalloc(unsigned long size,...) {
| ^~~~~~~~
In file included from /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/OFED_topdir/BUILD/mlnx-ofa_kernel-4.9/obj/default/include/linux/slab.h:6,
from include/linux/crypto.h:24,
from include/crypto/hash.h:16,
from include/linux/uio.h:16,
from include/linux/socket.h:8,
from /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/OFED_topdir/BUILD/mlnx-ofa_kernel-4.9/obj/default/include/linux/socket.h:4,
from ./include/uapi/linux/if.h:25,
from /tmp/MLNX_OFED_LINUX-4.9-5.1.0.0-4.18.0-372.19.1.el8_6.x86_64/mlnx_iso.66001/OFED_topdir/BUILD/mlnx-ofa_kernel-4.9/obj/default/include/linux/compat-2.6.h:12,
from <command-line>:
mlnx-ofa_kernel-4.9.rpmbuild.log (880.1 KB)
(I’ve uploaded the full log since it’s too long to properly abbreviate)
In the kernel-devel file /usr/src/kernels/4.18.0-372.19.1.el8_6.x86_64/include/linux/slab.h
the following definitions seem to be new with respect to other kernels (I took a look at the same file in a machine with an older kernel):
731 static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
732 {
733 return kvmalloc_node(size, flags | __GFP_ZERO, node);
734 }
735 static inline void *kvzalloc(size_t size, gfp_t flags)
736 {
737 return kvmalloc(size, flags | __GFP_ZERO);
738 }
Finally, $build_dir/OFED_topdir/BUILD/mlnx-ofa_kernel-4.9/source/include/linux/mm.h
contains a conflicting definition:
10
11 #ifndef HAVE_KVZALLOC
12 #include <linux/vmalloc.h>
13 #include <linux/slab.h>
14
15 static inline void *kvzalloc(unsigned long size,...) {
16 void *rtn;
17
18 rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
19 if (!rtn)
20 rtn = vzalloc(size);
21 return rtn;
22 }
23 #endif
Looking at the full rpmbuild.log the same thing seems to be happening with kvcalloc
, kvmalloc_array
, kvmalloc_node
, kvmalloc_array
, etc. always in default/include/linux/slab.h
and default/include/linux/mm.h
.
I’m guessing it might be possible to adjust the compilation options to ignore local definitions and use the kernel’s, but:
- I don’t see in the documentation if this would be possible or how to do it (Do I have to unpack the source, modify the makefile and pack again?)
- I’m not familiar enough with the code to know if that would actually work or instead break more things.
I would really appreciate if someone could point me in the right direction. I’m guessing this will be addressed in the next release, but it would be cleaner to be able to compile instead of trying to downgrade everything to 8.5 while waiting for the release.
Regards,
Joaquín Torres.
HPC System Administrator.
Centro Atómico Constituyentes.
Comisión Nacional de Energía Atómica.
Villa Maipú. Buenos Aires, Argentina.