ConnectX-3 on VMWare ESXi 6.7 with inbox driver link not coming up

Hello,

I installed ESXi 6.7U3 and put on all the updates. With it comes the inbox VMWare driver.

Driver is installed. NICs are generated and I can see them in the console. I can also create Switches and assign them as uplinks.

BUT… the uplinks remain down. They don’t come up.

Is there any trick I need to do?

Any answer welcome.

Hi Arne,

Are they connected to a switch ? back to back ?

If switch, does the switch have the same MTU ?

are you encountering this issue only in one server ?

Thanks,

Samer

Hi Arne,

Which Type of adapter are you using ? Infiniband or Ethernet ?

Which Driver did you install ?

Latest driver support by Mellanox (not inbox) is 3.16.11.10 for ESXi 6.5 for ConnectX-3

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI65-MELLANOX-NMLX4-EN-3161110&productId=614

Thanks,

Samer

The driver was instaled automatically during ESXi 6.7 installation. I used the ISO I downloaded from VMWare.

This is what is installed:

[root@h7esx1:/] esxcli software vib list | grep mlx

net-mlx4-core 1.9.7.0-1vmw.670.0.0.8169922 VMW VMwareCertified 2020-01-12

net-mlx4-en 1.9.7.0-1vmw.670.0.0.8169922 VMW VMwareCertified 2020-01-12

nmlx4-core 3.17.13.1-1vmw.670.2.48.13006603 VMW VMwareCertified 2020-01-12

nmlx4-en 3.17.13.1-1vmw.670.2.48.13006603 VMW VMwareCertified 2020-01-12

nmlx4-rdma 3.17.13.1-1vmw.670.2.48.13006603 VMW VMwareCertified 2020-01-12

nmlx5-core 4.17.13.1-1vmw.670.3.73.14320388 VMW VMwareCertified 2020-01-12

nmlx5-rdma 4.17.13.1-1vmw.670.2.48.13006603 VMW VMwareCertified 2020-01-12

[root@h7esx1:/]

[root@h7esx1:/] lspci | grep -i Mel

0000:05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] [vmnic2]

[root@h7esx1:/]

[root@h7esx1:~] uname -a

VMkernel h7esx1 6.7.0 #1 SMP Release build-15160138 Nov 22 2019 20:49:31 x86_64 x86_64 x86_64 ESXi

[root@h7esx1:][root@h7esx1:] esxcli network nic list | grep -i Mell

vmnic1000202 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:21:a2:92 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

vmnic2 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:21:a2:91 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

[root@h7esx1:~]

Hi Arne,

To use your connectX3 Pro on ESXi 6.7 update 3 it is recommended that you use the nmlx4_en v3.16.11.0 driver which can be downloaded here https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI65-MELLANOX-NMLX4-EN-3161110&productId=614 ​​​​​​​

While this version was originally made for ESXi 6.5 it should also work on ESXi 6.7u3

You can find installation instructions here

http://www.mellanox.com/related-docs/prod_software/Mellanox_Native_ESX_Driver_for_VMware_vSphere_6.5_User_Manual_v3.16.11.10.pdf

Let me know if it’s working now.

Thanks,

Samer

Hi Samer,

i have done the installation.

This is the result:

[root@h7esx1:~] esxcli software vib list | grep -i mlx

nmlx4-core 3.16.11.10-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-13

nmlx4-en 3.16.11.10-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-13

nmlx4-rdma 3.16.11.10-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-13

[root@h7esx1:~]

ports are still down:

[root@h7esx1:~] esxcli network nic list | grep -i mel

vmnic1000202 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:52 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

vmnic2 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:51 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

[root@h7esx1:~]

Hi Arne,

Ok now please install the mft 4.13.3

  1. Install the package. Run:

esxcli software vib install -v

http://www.mellanox.com/downloads/MFT/vmware_6.5_native/nmst-4.13.3.6-1OEM.650.0.0.4598673.x86_64.vib

i.e

esxcli software vib install -v nmst-4.13.3.6-1OEM.650.0.0.4598673.x86_64.vib

esxcli software vib install -v

http://www.mellanox.com/downloads/MFT/vmware_6.5_native/mft-4.13.3.6-10EM-650.0.0.4598673.x86_64.vib

i.e

esxcli software vib install -v mft-4.13.3.6-10EM-650.0.0.4598673.x86_64.vib

  1. Reboot system.

  2. Start the mst driver. Run:

/opt/mellanox/bin/mst start

For further information refer to the user manual

https://docs.mellanox.com/display/MFTV4133/VMware+ESXi

If needed unload/load the driver :

  1. Unload the driver:

esxcfg-module -u nmlx4_en

esxcfg-module -u nmlx4_core

  1. Load the driver:

esxcfg-module nmlx4_core

esxcfg-module nmlx4_en

Thanks,

Samer

unfortunately no luck still. ports remain down. I checked in the datacentre and the links itself on the HCA go up according to the LEDs on them.

[root@h7esx1:~] esxcli software vib list | grep -i MEL

nmlx4-core 3.16.11.10-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-13

nmlx4-en 3.16.11.10-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-13

nmlx4-rdma 3.16.11.10-1OEM.650.0.0.4598673 MEL VMwareCertified 2020-01-13

nmst 4.13.3.6-1OEM.650.0.0.4598673 MEL PartnerSupported 2020-01-13

mft 4.13.3.6-0 Mellanox PartnerSupported 2020-01-13

[root@h7esx1:~] /opt/mellanox/bin/mst start

Module mst is already loaded

[root@h7esx1:~] esxcli network nic list | grep -i mel

vmnic1000202 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:52 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

vmnic2 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:51 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

[root@h7esx1:~] esxcfg-module -u nmlx4_en

Module nmlx4_en unloaded successfully

[root@h7esx1:~] esxcfg-module -u nmlx4_core

Module nmlx4_core unloaded successfully

[root@h7esx1:~] esxcfg-module nmlx4_core

Module nmlx4_core loaded successfully

[root@h7esx1:~] esxcfg-module nmlx4_en

Module nmlx4_en loaded successfully

[root@h7esx1:~] esxcli network nic list | grep -i mel

[root@h7esx1:~] esxcli network nic list | grep -i mel

[root@h7esx1:~] /opt/mellanox/bin/mst start

Module mst is already loaded

[root@h7esx1:~] esxcli network nic list | grep -i mel

[root@h7esx1:~] esxcli network nic list | grep -i mel

vmnic1000202 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:52 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

vmnic2 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:51 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

[root@h7esx1:~]

Hi Samer,

they are connected to a switch. I do encounter the same problem on all upgraded hosts. With ESX5.5 I had no issue everything was running with the cards.

Was already thinking if I need a different firmware. Or if I have to set the cards manually somehow. The Subnet manager is not showing any alarms.

BTW: MTU should only play a role when the interfaces are logically up… they are only physically up, all the logic remains down

Thanks

Arne

Hi Arne,

Subnet manager ?

are you using infiniband adapters ?

Thanks,

Samer

It is a CX354A-QCBT. And when it was running under ESX5.5U3 it was using Infiniband. I understand that they can also run as Ethernet cards having the Ethernet driver loaded. So thats what we do when we load nmlx4_en don’t we?

Hi Arne,

If it’s Infiniband then you are using the wrong driver.

the driver you are using is ESXi 6.5/6.7, that supports Ethernet only as ESXi (VMware) has moved away from InfiniBand after version ESXi 6.0,

so there will not be no InfiniBand driver available for ESXi 6.5/6.7, only Ethernet.

Note - If you want to use our ESXi InfiniBand driver you would need to downgrade the ESXi version to

ESXi 6.0 and use the following driver

https://www.mellanox.com/page/products_dyn?&product_family=36&mtag=vmware_drivers

For ESXi 6.0

https://www.mellanox.com/downloads/Software/MLNX-OFED-ESX-2.4.0.0-10EM-600.0.0.2494585.zip

For ESXi 5.5

https://www.mellanox.com/downloads/Software/MLNX-OFED-ESX-2.4.0.0-10EM-550.0.0.1331820.zip

Thanks,

Samer

If you would like to work on Ethernet , refer to section 3.1.1.1 Port Type Management in the user manual

https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_ESXi_User_Manual_v2.4.0.pdf

and update the VPI adapter to work on ethernet using the following command

/opt/mellanox/bin/mlxconfig -d /dev/mt4099_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Thanks,

Samer

Hi Samer,

no luck there, ports remain down even after setting them to Ethernet:

[root@h7esx1:/opt/mellanox/bin] /opt/mellanox/bin/mlxconfig -d mt4099_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:


Device type: ConnectX3

Device: mt4099_pciconf0

Configurations: Next Boot New

LINK_TYPE_P1 VPI(3) ETH(2)

LINK_TYPE_P2 VPI(3) ETH(2)

Apply new Configuration? (y/n) [n] : y

Applying… Done!

-I- Please reboot machine to load new configurations.

[root@h7esx1:/opt/mellanox/bin]

[root@h7esx1:~] /opt/mellanox/bin/mlxconfig -d mt4099_pci_cr0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:


Device type: ConnectX3

Device: mt4099_pci_cr0

Configurations: Next Boot New

LINK_TYPE_P1 ETH(2) ETH(2)

LINK_TYPE_P2 ETH(2) ETH(2)

Apply new Configuration? (y/n) [n] : y

Applying… Done!

-I- Please reboot machine to load new configurations.

[root@h7esx1:~]reboot

[root@h7esx1:~] esxcli network nic list | grep Mel

vmnic1000202 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:52 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

vmnic2 0000:05:00.0 nmlx4_en Up Down 0 Half 00:02:c9:a2:7b:51 1500 Mellanox Technologies MT27500 Family [ConnectX-3]

[root@h7esx1:~]

one note here the “/dev/” from your and the documentations reference need to be removed as the device isn’t available in /dev for whatsoever reason. It does work without /dev

So what to do here in order to get the ports up?

Hi Arne,

Since you set all interfaces as expected, i suggest now to open a support case with Mellanox at

support@mellanox.com

i see that you already have a contract with our support .

Next would be debug this issue in live.

Thanks,

Samer

Hi Samer,

quick update from my side:

  1. I managed to get connection working. The trick was to direct connect the ESX hosts and not use the IB switches. It seems they are not able to transmit the EN protocol. Direct connection however works. Strange thing here was that the Switches showed the ports coming up…

  2. the connection speed is lousy. Even though I see 40GBit in ESX, its just giving me some 13-14 GBit for real:

esxcli network nic list | grep -i mel

vmnic1000202 0000:05:00.0 nmlx4_en Up Up 40000 Full 00:02:c9:21:a2:92 9000 Mellanox Technologies MT27500 Family [ConnectX-3]

vmnic2 0000:05:00.0 nmlx4_en Up Up 40000 Full 00:02:c9:21:a2:91 9000 Mellanox Technologies MT27500 Family [ConnectX-3]

and testing it with iperf3:

Accepted connection from 192.168.112.102, port 53966

[ 5] local 192.168.112.101 port 5201 connected to 192.168.112.102 port 40802

iperf3: getsockopt - Function not implemented

[ ID] Interval Transfer Bandwidth

[ 5] 0.00-1.00 sec 1.33 GBytes 11.5 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 1.00-2.00 sec 1.60 GBytes 13.7 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 2.00-3.00 sec 1.32 GBytes 11.3 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 3.00-4.00 sec 1.61 GBytes 13.8 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 4.00-5.00 sec 1.69 GBytes 14.5 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 5.00-6.00 sec 1.64 GBytes 14.1 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 6.00-7.00 sec 1.48 GBytes 12.7 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 7.00-8.00 sec 1.49 GBytes 12.8 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 8.00-9.00 sec 1.50 GBytes 12.9 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 9.00-10.00 sec 1.67 GBytes 14.4 Gbits/sec

iperf3: getsockopt - Function not implemented

[ 5] 10.00-10.10 sec 176 MBytes 14.5 Gbits/sec


[ ID] Interval Transfer Bandwidth

[ 5] 0.00-10.10 sec 0.00 Bytes 0.00 bits/sec sender

[ 5] 0.00-10.10 sec 15.5 GBytes 13.2 Gbits/sec receiver


Server listening on 5201


Any idea how to sort this out?

I mean when I was using the cards in ESX5.5 using the real Infiniband mode it was working with 56GBit like a charm. It would be great if you can come up with something close to it, or even better with the real Infiniband supported again, what would reduce round trip times and increase speed.

By for now

ESXi 6.7 inbox drivers are only support ethernet.

If you are the Infiniband user, you must uninstall all of pvrdma & ethernet drivers then install Mellanox OFED 1.8.2.5 for ESXi 6.0 with force option.

Regards,

Jaehoon Choi

P.S

I’m also disappointed that VMWare can’t support Infiniband anymore…:(