Unable to load Mellanox OFED Drivers on Ubuntu 24.04 AWS EKS Optimized Instance

Hi,

I’m wondering if anyone can help me in troubleshooting the installation of Mellanox drivers on an Ubuntu 24.04 EKS optimized instance.

The drivers I’m attempting to install is MLNX_OFED_LINUX-24.10.3.2.5.0-ubuntu24.04-x86_64

If I try and install the drivers I get an kernel package error for mlnx-ofed-kernel-dkms.

I’ve tried to install the drivers without the package and it looks to install successfully but fails when restarting the init.d/openibd service

I can see that it is failing on;

No HCA kernel modules loaded

Loading HCA driver and Access Layer

Do I need the ofed-kernel package installing? If I try to install this individually it fails as well

The kernel version I’m using is 6.14.0-1016

I’m getting the error below when trying to install the ofed-kernel directly

ERROR: Cannot create report: (Errno 17) File exists: ‘/var/crash/mlnx-ofed-kernel-dkms.0.crash’

Error! Bad return status for module build on kernel: 6.14.0-1016-aws (x86_64)

Consult /var/lib/dkms/mlnx-ofed-kernel/24.10.OFED.24.10.3.2.5.1/build/make.log for more information.

dpkg: error processing package minx-ofed-kernel-dkms (–configure) :

installed minx-ofed-kernel-dkms package post-installation script subprocess returned error exit status 10

Processing triggers for initramfs-tools (0.142ubuntu25.5) … update-initramfs: Generating /boot/initrd.img-6.14.0-1016-aws

Errors were encountered while processing:

minx-ofed-kernel-dkms

E:

Sub-process /us/bin/dpkg returned an error code (1)

Any help would be greatly appreciated

drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c:1325:35: error: initialization of ‘int (*)(struct net_device *, struct xfrm_state *, struct netlink_ext_ack )’ from incompatible pointer type ‘int ()(struct xfrm_state )’ [-Werror=incompatible-pointer-types]
1325 | .xdo_dev_state_add = mlx5e_xfrm_add_state,
| ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c:1325:35: note: (near initialization for ‘mlx5e_ipsec_xfrmdev_ops.xdo_dev_state_add’)
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c:1326:35: error: initialization of ‘void (
)(struct net_device *, struct xfrm_state )’ from incompatible pointer type ‘void ()(struct xfrm_state )’ [-Werror=incompatible-pointer-types]
1326 | .xdo_dev_state_delete = mlx5e_xfrm_del_state,
| ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c:1326:35: note: (near initialization for ‘mlx5e_ipsec_xfrmdev_ops.xdo_dev_state_delete’)
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c:1327:35: error: initialization of ‘void (
)(struct net_device *, struct xfrm_state )’ from incompatible pointer type ‘void ()(struct xfrm_state *)’ [-Werror=incompatible-pointer-types]
1327 | .xdo_dev_state_free = mlx5e_xfrm_free_state,
| ^~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c:1327:35: note: (near initialization for ‘mlx5e_ipsec_xfrmdev_ops.xdo_dev_state_free’)
CC [M] drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.o
cc1: some warnings being treated as errors
make[5]: *** [/usr/src/linux-headers-6.14.0-1016-aws/scripts/Makefile.build:207: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.o] Error 1
make[5]: *** Waiting for unfinished jobs…
make[4]: *** [/usr/src/linux-headers-6.14.0-1016-aws/scripts/Makefile.build:465: drivers/net/ethernet/mellanox/mlx5/core] Error 2
make[3]: *** [/usr/src/linux-headers-6.14.0-1016-aws/Makefile:1997: .] Error 2
make[2]: *** [/usr/src/linux-headers-6.14.0-1016-aws/Makefile:251: __sub-make] Error 2
make[2]: Leaving directory ‘/var/lib/dkms/mlnx-ofed-kernel/24.10.OFED.24.10.3.2.5.1/build’
make[1]: *** [Makefile:251: __sub-make] Error 2
make[1]: Leaving directory ‘/usr/src/linux-headers-6.14.0-1016-aws’

Looks like an IP-Sec incompatability