Installing OFED with NVMe-oF on a System with OS Installed on NVMe

Hi OFED Driver Team,

I am installing OFED on a server with a ConnectX-7 adapter and aiming to enable NVMe-oF. However, I am encountering an issue because the server’s operating system is also installed on an NVMe SSD. After successfully installing OFED, I encountered the following message:

Note: In order to load the new nvme-rdma and nvmet-rdma modules, the nvme module must be reloaded.

$ sudo modprobe -r nvme
modprobe: FATAL: Module nvme is in use.

The problem appears to be that modprobe -r nvme fails due to the NVMe SSD being used as the system drive.

My question is: How can I safely reload the nvme module and load nvme-rdma and nvmet-rdma when the OS is also installed on an NVMe SSD?

I would appreciate any guidance on how to properly reload these modules without disrupting the OS on the NVMe SSD. Thank you for your assistance!


Environment Details:

$ uname -r
6.8.0-41-generic

$ lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 24.04 LTS
Release:    24.04
Codename:    noble

$ ibstat
CA 'mlx5_0'
    CA type: MT4129
    Number of ports: 1
    Firmware version: 28.41.1000
    Hardware version: 0
    Node GUID: ----
    System image GUID: ----
    Port 1:
        State: Down
        Physical state: Disabled
        Rate: 400
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x00010000
        Port GUID: ----
        Link layer: Ethernet

OFED Installation Command Used:

wget "https://content.mellanox.com/ofed/MLNX_OFED-24.04-0.6.6.0/MLNX_OFED_LINUX-24.04-0.6.6.0-ubuntu24.04-x86_64.iso"
sudo mlnxofedinstall -vvv --with-nvmf --add-kernel-support --basic --force --force-fw-update

End of mlnxofedinstall Log:

Device (c1:00.0):
    c1:00.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
    Link Width: x16
    PCI Link Speed: Unknown

Installation passed successfully
To load the new driver, run:
/etc/init.d/openibd restart
Note: In order to load the new nvme-rdma and nvmet-rdma modules, the nvme module must be reloaded.

dmesg Errors Encountered when Loading Modules (nvme-fabrics, nvme-rdma, nvmet, nvmet-rdma):

[ 1711.360216] nvme_fabrics: disagrees about version of symbol __nvme_submit_sync_cmd
[ 1711.360222] nvme_fabrics: Unknown symbol __nvme_submit_sync_cmd (err -22)

[ 1724.339810] nvmet: disagrees about version of symbol nvme_command_effects
[ 1724.339816] nvmet: Unknown symbol nvme_command_effects (err -22)
[ 1724.339857] nvmet: disagrees about version of symbol nvme_passthru_end
[ 1724.339859] nvmet: Unknown symbol nvme_passthru_end (err -22)
[ 1724.340038] nvmet: disagrees about version of symbol nvme_find_get_ns
[ 1724.340040] nvmet: Unknown symbol nvme_find_get_ns (err -22)
[ 1724.340063] nvmet: Unknown symbol nvme_find_noiob_from_bdev (err -2)
[ 1724.340141] nvmet: disagrees about version of symbol nvme_passthru_start
[ 1724.340144] nvmet: Unknown symbol nvme_passthru_start (err -22)
[ 1724.340219] nvmet: Unknown symbol nvme_find_pdev_from_bdev (err -2)
[ 1724.340254] nvmet: disagrees about version of symbol nvme_ctrl_from_file
[ 1724.340257] nvmet: Unknown symbol nvme_ctrl_from_file (err -22)
[ 1724.340299] nvmet: disagrees about version of symbol nvme_put_ns
[ 1724.340301] nvmet: Unknown symbol nvme_put_ns (err -22)
[ 1724.340345] nvmet: disagrees about version of symbol nvme_get_features
[ 1724.340347] nvmet: Unknown symbol nvme_get_features (err -22)

I also checked similar thread posts, but I think they can do sudo modprove -r nvme:

Hi

  1. You can reboot the kernel, and all the drivers will load
  2. Don’t try to remove the nvme (modprobe -r nvme) do only modprobe nvme-rdma