NVMe module fails to compile on Azure Ubuntu 22.04 instances

I am trying to install OFED drivers (23.10-0.5.5.0) from source in a MS Azure instance created from the Canonical Ubuntu 22.04 Server image (gen2; x86-64; latest version, 22.04.202312060). However, the installation fails because the NVMe module does not compile successfully.

I am using the following install command:

sudo mlnxofedinstall --add-kernel-support --skip-unsupported-devices-check --without-ucx-cuda --without-fw-update --force

The main error log points to mlnx-nvme.debbuild.log, which contains the following errors (similar lines are omitted; the full file is attached):

.
.
.
  CC [M]  /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/lpfc/lpfc_dummy.o
  LD [M]  /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/lpfc/lpfc.o
  MODPOST /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/Module.symvers
ERROR: modpost: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/host/nvme-core: 'admin_timeout' exported twice. Previous export was in vmlinux
ERROR: modpost: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/host/nvme-core: 'nvme_io_timeout' exported twice. Previous export was in vmlinux
ERROR: modpost: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/host/nvme-core: 'nvme_wq' exported twice. Previous export was in vmlinux
.
.
.
ERROR: modpost: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/host/nvme-core: 'nvme_auth_init_ctrl' exported twice. Previous export was in vmlinux
ERROR: modpost: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/host/nvme-core: 'nvme_auth_stop' exported twice. Previous export was in vmlinux
ERROR: modpost: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/host/nvme-core: 'nvme_auth_free' exported twice. Previous export was in vmlinux
make[4]: *** [scripts/Makefile.modpost:138: /tmp/MLNX_OFED_LINUX-23.10-0.5.5.0-6.2.0-1018-azure/mlnx_iso.23926/mlnx-nvme/mlnx-nvme-23.10.OFED.23.10.0.5.3.1/Module.symvers] Error 1
make[3]: *** [Makefile:1978: modpost] Error 2

This error happens on a brand new, direct-from-image instance, and it seems to happen whether SecureBoot/TPM features are enabled or disabled.

I have also double checked that the installer sees the kernel version that is running (6.2.0-1018-azure), and the version for which sources are installed.

I don’t really need NVMe support for my purposes, but I don’t see any obvious way to disable it without editing the install script (it seems to be included in the “basic” modules). And I don’t know what, if any, consequences that would have.

My guess is that an Azure-specific kernel patch is causing problems, but I’m at a loss to debug this further on my own. I have the full debugging output from a failed build if that would help.

Thank you in advance for any help in resolving this issue!
mlnx-nvme.debbuild.log (147.9 KB)

Oops. I misread the documentation: kernel 6.2 is supported for Ubuntu 23.04 not Ubuntu 22.04. My mistake!

I installed the GA kernel using the command:

apt-get install linux-azure-lts-22.04

and rebooted into the version 5.5 kernel.

After that, everything worked.

Yes, kernel 6.2 not support.

FYI, RN,

https://docs.nvidia.com/networking/display/mlnxofedv23100550/general+support

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.