ConnectX-3 on VMware ESXi 6.5 not seen completely in the UI or command line

Just installed a fresh copy of ESXi 6.5 15256549 on a Dell R710. However, I don’t fully see the card in the vSphere UI at all. The vmnic doesn’t show up in the storage/network configuration sections of the UI.

The card and driver seems to be installed per the esxi server:

[root@esxi04:/opt/mellanox/bin] esxcli software vib list | grep mlx

nmlx4-core 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx4-en 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx4-rdma 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx5-core 4.16.10.3-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx5-rdma 4.16.10.3-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

net-mlx4-core 1.9.7.0-1vmw.650.2.50.8294253 VMW VMwareCertified 2020-01-07

net-mlx4-en 1.9.7.0-1vmw.650.2.50.8294253 VMW VMwareCertified 2020-01-07

[root@esxi04:/opt/mellanox/bin] lspci |grep -i mel

0000:06:00.0 Infiniband controller: Mellanox Technologies MT27500 Family [ConnectX-3] [vmnic5]

What’s odd is when I run the firmware manager, I hit this error:

[root@esxi04:/opt/mellanox/bin] ./mlxfwmanager

-E- Failed to initialize management interface (rc=0x13), make sure that “nmst” driver is loaded first

This system was previously running RHEL 7.7, so I know the card and driver are working fine.

[root@rhev1 ~]# mlxfwmanager

Querying Mellanox devices firmware …

Device #1:


Device Type: ConnectX3

Part Number: MCX354A-FCB_A2-A5

Description: ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6

PSID: MT_1090120019

PCI Device Name: 0000:03:00.0

Port1 GUID: 24be05ffff81e6e1

Port2 GUID: 24be05ffff81e6e2

Versions: Current Available

FW 2.42.5000 N/A

PXE 3.4.0752 N/A

Status: No matching image found

Just a note, this card was an HP 544QSFP card with HP firmware which I have flashed using this method [1] with the latest ConnectX-3 firmware and had everything working flawlessly with RHEL on 56 Gbit.

Any suggestions are greatly appreciated. Thanks!

[1] https://forums.servethehome.com/index.php?threads/mellanox-connectx-3-vpi-mcx354a-fcbt-hp-oem-but-with-mellanox-oem-firmware-40-usd-each.23947/#post-222930

Not sure if this matters, but here is the dmesg output.

[root@esxi04:/opt/mellanox/bin] dmesg |grep mlx

VMB: 323: name: /nmlx4_co.v00

VMB: 323: name: /nmlx4_en.v00

VMB: 323: name: /nmlx4_rd.v00

VMB: 323: name: /nmlx5_co.v00

VMB: 323: name: /nmlx5_rd.v00

VMB: 323: name: /net_mlx4.v00

VMB: 323: name: /net_mlx4.v01

2020-01-14T19:45:20.902Z cpu0:65536)VisorFSTar: 1982: nmlx4_co.v00 for 0xb5980 bytes

2020-01-14T19:45:20.904Z cpu0:65536)VisorFSTar: 1982: nmlx4_en.v00 for 0xbcd08 bytes

2020-01-14T19:45:20.906Z cpu0:65536)VisorFSTar: 1982: nmlx4_rd.v00 for 0x51ca0 bytes

2020-01-14T19:45:20.907Z cpu0:65536)VisorFSTar: 1982: nmlx5_co.v00 for 0x16e920 bytes

2020-01-14T19:45:20.911Z cpu0:65536)VisorFSTar: 1982: nmlx5_rd.v00 for 0x3f928 bytes

2020-01-14T19:45:21.307Z cpu0:65536)VisorFSTar: 1982: net_mlx4.v00 for 0x57e88 bytes

2020-01-14T19:45:21.308Z cpu0:65536)VisorFSTar: 1982: net_mlx4.v01 for 0x3b10a bytes

2020-01-14T19:45:32.360Z cpu2:66047)Loading module nmlx4_core …

2020-01-14T19:45:32.362Z cpu2:66047)Elf: 2043: module nmlx4_core has license BSD

2020-01-14T19:45:32.376Z cpu2:66047)<NMLX_INF> nmlx4_core: init_module called

2020-01-14T19:45:32.376Z cpu2:66047)Device: 191: Registered driver ‘nmlx4_core’ from 23

2020-01-14T19:45:32.376Z cpu2:66047)Mod: 4972: Initialization of nmlx4_core succeeded with module ID 23.

2020-01-14T19:45:32.376Z cpu2:66047)nmlx4_core loaded successfully.

2020-01-14T19:45:32.385Z cpu0:65975)<NMLX_INF> nmlx4_core: 0000:06:00.0: nmlx4_core_Attach - (partners/mlnx/nmlx4/nmlx4_core/nmlx4_core_main.c:2476) running

2020-01-14T19:45:32.385Z cpu0:65975)DMA: 646: DMA Engine ‘nmlx4_core’ created using mapper ‘DMANull’.

2020-01-14T19:45:32.385Z cpu0:65975)DMA: 646: DMA Engine ‘nmlx4_core’ created using mapper ‘DMANull’.

2020-01-14T19:45:32.385Z cpu0:65975)DMA: 646: DMA Engine ‘nmlx4_core’ created using mapper ‘DMANull’.

2020-01-14T19:45:41.235Z cpu0:65975)<NMLX_INF> nmlx4_core: 0000:06:00.0: nmlx4_CmdQueryDevCap - (partners/mlnx/nmlx4/nmlx4_core/nmlx4_core_fw.c:391) Device supports DMFS

2020-01-14T19:45:41.235Z cpu0:65975)<NMLX_ERR> nmlx4_core: 0000:06:00.0: nmlx4_FillDevCap - (partners/mlnx/nmlx4/nmlx4_core/nmlx4_core_main.c:954) HCA’s port number 1 has not supported port type IB, aborting.

2020-01-14T19:45:41.235Z cpu0:65975)<NMLX_ERR> nmlx4_core: 0000:06:00.0: nmlx4_HcaInit - (partners/mlnx/nmlx4/nmlx4_core/nmlx4_core_main.c:1541) nmlx4_FillDevCap failed: Not supported

2020-01-14T19:45:41.235Z cpu0:65975)<NMLX_ERR> nmlx4_core: 0000:06:00.0: nmlx4_core_Attach - (partners/mlnx/nmlx4/nmlx4_core/nmlx4_core_main.c:2580) nmlx4_HcaInit failed: Not supported

2020-01-14T19:45:42.235Z cpu0:65975)DMA: 691: DMA Engine ‘nmlx4_core’ destroyed.

2020-01-14T19:45:42.235Z cpu0:65975)DMA: 691: DMA Engine ‘nmlx4_core’ destroyed.

2020-01-14T19:45:42.235Z cpu0:65975)DMA: 691: DMA Engine ‘nmlx4_core’ destroyed.

2020-01-14T19:45:45.766Z cpu3:66127)Loading module mlx4_core …

2020-01-14T19:45:45.775Z cpu3:66127)Elf: 2043: module mlx4_core has license GPL

2020-01-14T19:45:45.781Z cpu3:66127)module heap vmklnx_mlx4_core: Initial heap size = 16384, max heap size = 539607040

2020-01-14T19:45:45.781Z cpu3:66127)module mempool vmklnx_mlx4_core_skb: creation succeeded. initial size = 524288, max size = 23068672

2020-01-14T19:45:45.781Z cpu3:66127)module heap vmklnx_mlx4_core: using memType 2

2020-01-14T19:45:45.781Z cpu3:66127)module heap vmklnx_mlx4_core: creation succeeded. id = 0x43061488c000

2020-01-14T19:45:45.781Z cpu3:66127)PCI: driver mlx4_core is looking for devices

<6>mlx4_core: Mellanox ConnectX core driver v1.9.7.0 (Dec-03-2012)

2020-01-14T19:45:45.781Z cpu3:66127)<6>mlx4_core: Initializing 0000:06:00.0

2020-01-14T19:45:51.989Z cpu0:66127)<6>mlx4_core 0000:06:00.0: port=1 IB MTU=2048

2020-01-14T19:45:51.989Z cpu0:66127)<6>mlx4_core 0000:06:00.0: port=2 IB MTU=2048

2020-01-14T19:45:51.989Z cpu0:66127)PCI: driver mlx4_core claimed device 0000:06:00.0

2020-01-14T19:45:51.989Z cpu0:66127)PCI: driver mlx4_core claimed 1 device

2020-01-14T19:45:51.989Z cpu0:66127)Mod: 4972: Initialization of mlx4_core succeeded with module ID 4129.

2020-01-14T19:45:51.989Z cpu0:66127)mlx4_core loaded successfully.

2020-01-14T19:45:54.536Z cpu0:65938)Activating Jumpstart plugin mlx4_en.

2020-01-14T19:45:54.559Z cpu12:66272)Loading module mlx4_en …

2020-01-14T19:45:54.561Z cpu12:66272)Elf: 2043: module mlx4_en has license GPL

2020-01-14T19:45:54.565Z cpu12:66272)module heap vmklnx_mlx4_en: Initial heap size = 16384, max heap size = 4833280

2020-01-14T19:45:54.565Z cpu12:66272)module mempool vmklnx_mlx4_en_skb: creation succeeded. initial size = 524288, max size = 23068672

2020-01-14T19:45:54.565Z cpu12:66272)module heap vmklnx_mlx4_en: using memType 2

2020-01-14T19:45:54.565Z cpu12:66272)module heap vmklnx_mlx4_en: creation succeeded. id = 0x4305e5f09000

<6>mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.9.7.0 (Dec-03-2012)

2020-01-14T19:45:54.565Z cpu12:66272)Mod: 4972: Initialization of mlx4_en succeeded with module ID 4154.

2020-01-14T19:45:54.565Z cpu12:66272)mlx4_en loaded successfully.

2020-01-14T19:45:54.571Z cpu4:65938)Jumpstart plugin mlx4_en activated.

Installed mst correctly, but it still doesn’t see the card, which is odd even though lspci does show it in the OP.

[root@esxi04:/opt/mellanox/bin] ./mst start

Module mst is already loaded

[root@esxi04:/opt/mellanox/bin] ./mst status -v

No MST devices were found or MST modules are not loaded.

You may need to run ‘mst start’ to load MST modules.

[root@esxi04:/opt/mellanox/bin] ./mlxfwmanager

-E- No devices found or specified, mst might be stopped, run ‘mst start’ to load MST modules

The kernel module is loaded for the ethernet driver so I would at least expect to see that in the UI.

[root@esxi04:/opt/mellanox/bin] vmkload_mod --list |grep mlx

nmlx4_core 0 360

mlx4_core 1 208

mlx4_en 0 132

Hi Sam,

First please clarify in which link type are you working IB/ETH ?

If it’s IB , need to install the following driver 2.4.0 that support ESXi 6.0:

https://www.mellanox.com/downloads/Software/MLNX-OFED-ESX-2.4.0.0-10EM-600.0.0.2494585.zip

The driver you are using is ESXi 6.5/6.7, that supports Ethernet only as ESXi (VMware) has moved away from InfiniBand after version ESXi 6.0,

so there will not be no InfiniBand driver available for ESXi 6.5/6.7, only Ethernet.

If it’s ETH, need to install the following driver 3.16.11.10

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI65-MELLANOX-NMLX4-EN-3161110&productId=614

Important notes:

  1. Please uninstall any previous Mellanox driver packages prior to installing the new version.

such 1.9.7.0/ 3.16.11.6-1/ 4.16.10.3-1

To remove all drivers refer to :

http://www.mellanox.com/related-docs/prod_software/Mellanox_Native_ESX_Driver_for_VMware_vSphere_6.5_User_Manual_v3.16.11.10.pdf

#> esxcli software vib remove -n nmlx4-rdma

#> esxcli software vib remove -n nmlx4-en

#> esxcli software vib remove -n nmlx4-core

#> esxcli software vib remove -n nmlx5-core

#> esxcli software vib remove -n nmlx5-rdma

#> esxcli software vib remove -n net-mlx4-core

#> esxcli software vib remove -n net-mlx4-en

and then apply a clean installation of 3.16.11.10

  1. Install MFT in 2 steps

esxcli software vib install -v

http://www.mellanox.com/downloads/MFT/vmware_6.5_native/nmst-4.13.3.6-1OEM.650.0.0.4598673.x86_64.vib

i.e

esxcli software vib install -v nmst-4.13.3.6-1OEM.650.0.0.4598673.x86_64.vib

esxcli software vib install -v

http://www.mellanox.com/downloads/MFT/vmware_6.5_native/mft-4.13.3.6-10EM-650.0.0.4598673.x86_64.vib

i.e

esxcli software vib install -v mft-4.13.3.6-10EM-650.0.0.4598673.x86_64.vib

Thanks,

Samer

Samer, thanks for your response.

It’s IB.

I find it odd that the RDMA driver is not included as you stated and would have to install the older OFED driver. The end goal here is to setup iSER [1] which requires RDMA, as I have this working in RHEL 7.

For instances, the software vib shows that the RDMA driver is installed along with the ethernet driver.

[root@esxi04:/opt/mellanox/bin] esxcli software vib list |grep -i mel

iser 1.0.0.2-1OEM.650.0.0.4598673 MEL PartnerSupported 2020-01-11

nmlx4-core 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx4-en 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx4-rdma 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx5-core 4.16.10.3-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmlx5-rdma 4.16.10.3-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-07

nmst 4.13.3.6-1OEM.650.0.0.4598673 MEL PartnerSupported 2020-01-14

mft 4.13.3.6-0 Mellanox PartnerSupported 2020-01-14

Although, the module listing only shows the ethernet driver currently loaded.

[root@esxi04:/opt/mellanox/bin] vmkload_mod --list |grep mlx

nmlx4_core 0 360

mlx4_core 1 208

mlx4_en 0 132

MST is installed, but doesn’t recognize the card properly (even though we see it as vmnic5 in the lspci output).

[root@esxi04:/opt/mellanox/bin] ./mst start

Module mst is already loaded

[root@esxi04:/opt/mellanox/bin] ./mlxfwmanager

-E- No devices found or specified, mst might be stopped, run ‘mst start’ to load MST modules

[root@esxi04:/opt/mellanox/bin] lspci |grep -i mel

0000:06:00.0 Infiniband controller: Mellanox Technologies MT27500 Family [ConnectX-3] [vmnic5]

So I’m a little lost reviewing the documentation here, as it states that the mode is ethernet, but don’t we need RDMA (IB) for iSER? (since this is what I have setup and working today with RHEL).

I’m going to also swap out the ConnectX-3 card on this ESXi server with a working one from my RHEL server just to rule out any hardware issues as well and to confirm the card on this ESXi server works in RHEL.

Thanks!

[1] http://www.mellanox.com/related-docs/prod_software/Mellanox_MLNX-NATIVE-ESX-iSER_Driver_for_VMware_ESXi_6.5_Quick_Start_Guide_v1.0.pdf