el5.10 ofed build problem.

Hi,

I updated to RHEL 5.10 and I’m trying to build ofed modules against the latest 2.6.18-371.1.2.el5 kernel.

./mlnx_add_kernel_support.sh -k 2.6.18-371.1.2.el5 -m /mnt/ofed --make-tgz

Below is the list of OFED packages that you have chosen

(some may have been added by the installer due to package dependencies):

ofed-scripts

kernel-ib

kernel-ib-devel

Uninstalling the previous version of OFED

Build ofed-scripts RPM

Running rpmbuild --rebuild --define ‘_topdir /tmp/mlnx_iso.14232/OFED_topdir’ --define ‘dist %{nil}’ --target x86_64 --define ‘_prefix /usr’ --define ‘_exec_prefix /usr’ --define ‘_sysconfdir /etc’ --define ‘_usr /usr’ /tmp/mlnx_iso.14232/MLNX_OFED_SRC-1.5.3-4.0.42/SRPMS/ofed-scripts-1.5.3-OFED.1.5.3.4.0.42.src.rpm

Install ofed-scripts RPM:

Running rpm -iv /tmp/mlnx_iso.14232/MLNX_OFED_SRC-1.5.3-4.0.42/RPMS/redhat-release-5Server-5.10.0.4/x86_64/ofed-scripts-1.5.3-OFED.1.5.3.4.0.42.x86_64.rpm

Build ofa_kernel RPM

Running rpmbuild --rebuild --define ‘_topdir /tmp/mlnx_iso.14232/OFED_topdir’ --nodeps --define ‘_dist .rhel5u10’ --define ‘configure_options --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-nes-mod --with-qib-mod --with-ipoib-mod --with-sdp-mod --with-srp-mod’ --define ‘build_kernel_ib 1’ --define ‘build_kernel_ib_devel 1’ --define ‘KVERSION 2.6.18-371.1.2.el5’ --define ‘K_SRC /lib/modules/2.6.18-371.1.2.el5/build/’ --define ‘network_dir /etc/sysconfig/network-scripts’ --define ‘_prefix /usr’ --define ‘__arch_install_post %{nil}’ /tmp/mlnx_iso.14232/MLNX_OFED_SRC-1.5.3-4.0.42/SRPMS/ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.src.rpm

Failed to build ofa_kernel RPM

See /tmp/OFED.14279.logs/ofa_kernel.rpmbuild.log

And the error is:

In file included from include/linux/inetdevice.h:7,

from /tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/kernel_addons/backport/2.6.18-EL5.7/include/linux/inetdevice.h:4,

from /tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/drivers/infiniband/core/addr.c:37:

/tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/kernel_addons/backport/2.6.18-EL5.7/include/linux/netdevice.h:25: error: conflicting types for ‘netif_is_bond_slave’

include/linux/netdevice.h:884: error: previous definition of ‘netif_is_bond_slave’ was here

make[4]: *** [/tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/drivers/infiniband/core/addr.o] Error 1

make[3]: *** [/tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/drivers/infiniband/core] Error 2

make[2]: *** [/tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/drivers/infiniband] Error 2

make[1]: *** [module/tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3] Error 2

make[1]: Leaving directory `/usr/src/kernels/2.6.18-371.1.2.el5-x86_64’

make: *** [kernel] Error 2

error: Bad exit status from /var/tmp/rpm-tmp.35818 (%build)

What I’m doing wrong?

Regards,

Tommi

I’m having the same problem. Did you fix it yet?

Thanks,

Koji

I only have kernel-devel and kernel-headers for 2.6.18-371.1.2.el5 installed, so I don’t think there’s a conflict between different kernels I have installed.

The problem looks like in 2.6.18-371.1.2.el5 they added the definition of netid_is_bond_slave:

static inline bool netif_is_bond_slave(struct net_device *dev)

{

return dev->flags & IFF_SLAVE && dev->priv_flags & IFF_BONDING;

}

It doesn’t match the “backported” version in ofa-kernel-1.5.3:

/tmp/mlnx_iso.7969/OFED_topdir/BUILD/ofa_kernel-1.5.3/kernel_addons/backport/2.6.18-EL5.7/include/linux/netdevice.h

static inline int netif_is_bond_slave(struct net_device *dev)

{

return dev->flags & IFF_SLAVE && dev->priv_flags & IFF_BONDING;

}

Same problem here, on RHEL 5.10 x86_64. Looks like the 2.6.18-371 kernel brought some header changes which MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.10-x86_64 was not prepared for.

Your logs point to conflicts in headers between different kernels you have on the system. Please make sure you install all rpms (including kernel-devel and kernel-headers) corresponding to your target kernel and resolve conflicts before using add_kernel_support (use rpm with “–replacefiles”). Also if you are running add_kernel_support under target kernel (that is what I’d do) you can skip “-k 2.6.18-371.1.2.el5” in command line, it will take “uname -r” value itself.

If you remove the conflicting definition of netif_is_bond_slave from ofa_kernel-1.5.3 (I actually created a kernel_addons/backpoint/2.6.18-EL5.10 directory and modified config.mk) it will build OK.

Going to run this in test for a while to see how it goes…

Hi Jubilex,

Can you explain what you did in bit more detail? I tried to do what you did. But config.mk keeps changing back and so I’m not able to remove the conflicting definition.

Thanks in advance,

Sreedhar.

It’s a bit involved… you end up having to repackage the SRPMS in MLNX_OFED_SRC so that you can run mlnx_add_kernel_support.sh. I cannot remember all the steps.

Here is a tarball of the modified MLNX_OFED_LINUX with added support for 2.6.18-371.1.2.el5. Hope this helps.

http://www.uvm.edu/~jtl/MLNX_OFED_LINUX-1.5.3-4.0.42-uvm1-rhel5.10-x86_64.tgz http://www.uvm.edu/~jtl/MLNX_OFED_LINUX-1.5.3-4.0.42-uvm1-rhel5.10-x86_64.tgz

Thanks so much for doing this work. With a slight addition to mlnx_add_kernel_support, you distribution works for CentOS (and, I assume RH) 5.11.

The patch to the file is this:

  1. Find the lines that look for release 5.10,

redhat-release*-5.10*|centos-release-5-10*|enterprise-release-5-10*)

distro=rhel5.10

;;

  1. Add the following line below

redhat-release*-5.11*|centos-release-5-11*|enterprise-release-5-11*)

distro=rhel5.11

;;