I have a Mellanox Dual-Port ConnectX-3 Pro EN 40/56 GbE QSFP+ NIC (MCX314A-BCCT)
I have been struggling for days now to get this card working.
To explain the topology, I have two other Hosts, with an Infiniband Switch between them.
Both of the other Hosts are plugged into the InfiniBand Switch via Passive QSFP Copper Cables.
The other two Hosts and Mellanox Cards in them are working fine, as can be seen below:
I am now trying to add a third host, with this NIC in it.
The NIC is VMware-certified, and I am using the very-latest Mellanox-written, VMware-provided Driver (Version 2.4.0) specifically designated for use with this card.
But the NIC Ports will not connect and link when plugged in to the Switch.
I tried replacing the Card. I tried replacing the Cables with known, tested cables.
I have tried changing the port_type_array in the Driver to both Ethernet and Infiniband.
When the card’s two ports are directly plugged into one another (while the NIC is in Ethernet Mode, no Subnet Manager needed), I get an active Link, so it’s not the Cables or the card.
Neither mode (Ethernet or InfiniBand) provides a link when plugged into the Switch.
I was planning for the card to be in InfiniBand mode because it is an InfiniBand Switch (with an integrated hardware Subnet Manager) on the other end, and the other two Hosts are running in InfiniBand mode.
I notice that the new NIC is using a different Driver, “mlx4_en”:
The existing, working NICs are using the “ib_ipoib” driver, and the new Host’s NIC is using the “mlx4_en” driver. Maybe that’s the new name of the Driver… although “net-ib-ipoib” came along with the Mellanox Drivers and is loaded and running.
I had to wonder, should it show the “mlx4_ib” Driver as the driver for the NIC, and not the “mlx4_en” Driver? Is it still using the Ethernet Driver and not the InfiniBand one? It’s not really clear… after all, both mlx4_ib and net-ib-ipoib were both installed, and are sitting there, loaded and running as well.
I have set the NIC to use InfiniBand mode in its boot parameters, i.e., using a port_type_array of 1 (Infiniband) in mlx4_core. The specific boot parameters I have set for mlx4_core are “port_type_array=1 enable_sys_tune=1 num_vfs=8 set_4k_mtu=1 enable_64b_cqe_eqe=1 probe_vf=3”
I’ve tried installing older drivers, going all the way back to the ESXi 5.0 Drivers, plus also OpenSM, in case the Subnet Manager on the Switch was somehow acting up, but nothing I did got those NIC ports to energize.
The InfiniBand card’s ports remain in “Down” state:
In addition to using the latest Driver (2.4.0) from Mellanox for use with the NIC, the card is using the very latest Firmware version available from Mellanox (2.42.5):
The InfiniBand switch is a 36-Port 40 GbE Mellanox Voltaire Grid Director 4036 (VLT-30015-IBM) with an integrated Hardware Subnet Manager:
I am at my wits end at this point, and so I am reaching out to the community here.
Does anyone have any suggestions on how I can get this card working?