When I install a ConnectX4 416-BCAT 40G fiber card into my esxi 6.7U1 server, the server disconnects from vcenter.

I am using a supermicro AS-1023US-TR4 server. All hardware is on vmware’s HCL. I have updated the card’s firmware and drivers. I can log into the server fine via ssh and it seems to be running ok, but vcenter times out like the services are not running. I have restarted the services and rebooted the host several times with no luck. I am working with vmware support, but they have no idea what is going on. Any ideas on where to look to debug this are really appreciated.

Error correction

You have not indicated to what fw & driver you’ve updated but assuming your now using the GA release Mellanox driver for Esxi6.7 v4.17.14.2 & fw is v12.23.1020 (can be retrieved from mellanox.com website) - then suggesting that you do the followings:

  1. “remove” the Esxi6.7 server from the vCenter,
  2. check on the Esxi6.7 server if CX-4 adapter is well detected, over both GUI & via. ssh, running: # esxcli network nic list
  3. check that the proper driver & fw are installed # esxcli software vib list | grep nmlx,
  4. run # ethtool –i vmnicX - to check that all information on driver ver., fw, device-number are well detected
  5. “Add” the Esxi6.7 server to the vCenter. If the server disconnect from the vCenter issue reoccurs, then it is definitely related to a VMware malfuction and case should be approced to them for trubleshhot the issue.

If VMware still insists it’s a Mllanox related issue, the I suggest you aske them to provide a detailed RCA (root-cause-analysis), open a support request to support@mellanox.com and present the RCA