Cannot install Mellanox OFED driver with 4.15.0-1041-azure kernel

Azure VM with 4.15.0-1041-azure kernel cannot install Mellanox OFED driver (same issue for 4.3-, 4.4-, 4.5-*).

Here’s part of the log after executing ./mlnxofedinstall --force --kernel-only --without-dkms --without-fw-update --with-infiniband-diags --package-install-options -D2 -vv (having run mlnx_add_kernel_support.sh before to add kernel support).

Below is the list of MLNX_OFED_LINUX packages that you have chosen

(some may have been added by the installer due to package dependencies):

libibumad

libopensm

libibmad

infiniband-diags

ofed-scripts

mlnx-ofed-kernel-utils

mlnx-ofed-kernel-modules

iser-modules

isert-modules

srp-modules

mlnx-nfsrdma-modules

mlnx-rdma-rxe-modules

kernel-mft-modules

knem-modules

This program will install the MLNX_OFED_LINUX package on your machine.

Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.

Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Checking SW Requirements…

Running: dpkg --configure -a --force-all

Running: apt-get install -f

Removing old packages…

Installing new packages

Installing libibumad-43.1.1.MLNX20171122.0eb0969…

Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libibumad_43.1.1.MLNX20171122.0eb0969-0.1.43101_amd64.deb

Installing libopensm-5.0.0.MLNX20180219.c610c42…

Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libopensm_5.0.0.MLNX20180219.c610c42-0.1.43101_amd64.deb

Installing libibmad-1.3.13.MLNX20170511.267a441…

Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libibmad_1.3.13.MLNX20170511.267a441-0.1.43101_amd64.deb

Installing infiniband-diags-5.0.0.MLNX20180124.dfd2235…

Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/infiniband-diags_5.0.0.MLNX20180124.dfd2235-0.1.43101_amd64.deb

Installing ofed-scripts-4.3…

Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/ofed-scripts_4.3-OFED.4.3.1.0.1_amd64.deb

Installing mlnx-ofed-kernel-utils-4.3…

Running /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-utils_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_amd64.deb

Installing mlnx-ofed-kernel-modules-4.3…

Running /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb

Error: mlnx-ofed-kernel-modules installation failed!

Collecting debug info…

See:

/tmp/MLNX_OFED_LINUX.31695.logs/mlnx-ofed-kernel-modules.debinstall.log

Removing newly installed packages…

Running: /usr/sbin/ofed_uninstall.sh --force --keep-mft

Here’s part of the log file:

/usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb

Selecting previously unselected package mlnx-ofed-kernel-modules.

(Reading database … 33122 files and directories currently installed.)

Preparing to unpack …/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb …

D000002: maintscript_new nonexistent preinst '/var/lib/dpkg/tmp.ci/preinst

Unpacking mlnx-ofed-kernel-modules (4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure) …

D000002: process_archive tmp.ci script/file ‘.’ contains dot

D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/postinst’ installed as ‘/var/lib/dpkg/info/mlnx-ofed-kernel-modules.postinst’

D000002: process_archive tmp.ci script/file ‘…’ contains dot

D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/control’ is control

D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/postrm’ installed as ‘/var/lib/dpkg/info/mlnx-ofed-kernel-modules.postrm’

D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/md5sums’ installed as ‘/var/lib/dpkg/info/mlnx-ofed-kernel-modules.md5sums’

Setting up mlnx-ofed-kernel-modules (4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure) …

D000002: fork/exec /var/lib/dpkg/info/mlnx-ofed-kernel-modules.postinst ( configure )

---------------- START OF DEBUG INFO -------------------

Install command: ./mlnxofedinstall --force --kernel-only --without-dkms --without-fw-update --with-infiniband-diags --package-install-options -D2 -vv

Vars dump:

- ofedlogs: /tmp/MLNX_OFED_LINUX.9852.logs

- MLNX_OFED_LINUX_VERSION: 4.3-1.0.1.0

- MLNX_OFED_ARCH: x86_64

- MLNX_OFED_DISTRO: ubuntu16.04

- distro: ubuntu16.04

- arch: x86_64

- kernel: 4.15.0-1041-azure

- config: /tmp/ofed.conf

- update_firmware: 0

Setup info:

- uname -r: 4.15.0-1041-azure

- uname -m: x86_64

- lsb_release -a: No LSB modules are available.

Distributor ID: Ubuntu

Description: Ubuntu 16.04.6 LTS

Release: 16.04

Codename: xenial

- cat /etc/issue: Ubuntu 16.04.6 LTS \n \l

- cat /proc/version: Linux version 4.15.0-1041-azure (buildd@lcy01-amd64-013) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #45-Ubuntu SMP Fri Mar 15 14:41:00 UTC 2019

- gcc --version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609

The command /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb was executed successfully, but mlnx-ofed-kernel-modules haven’t been made after that. Following commands outputs empty.

$ depmod -a

$ lsmod | grep mlnx

Do you have any suggestion on how to solve this issue? Thanks!

There’re builtin InfiniBand kernel modules in Azure vmlinux image, so Mellanox OFED installation does not work.

Corresponding config in /boot/config-4.15.0-1041-azure:

CONFIG_MLX4_CORE=y

CONFIG_MLX5_CORE=y

Then why do we have OFMD downloads available for Azure UBUNTU?

http://www.mellanox.com/page/firmware_table_Microsoft?mtag=oem_firmware_download

I am trying to install on Ubuntu Azure

MLNX_OFED_LINUX-4.2-1.2.2.0-ubuntu16.04-x86_64

but installation fails.