I updated several Dell C6420 clients with a Mellanox Technologies MT28800 Family [ConnectX-5 Ex] card from centos 7.4.1708 to centos 7.9.2009 and a lot of them no longer negotiate 100GbE connections and the link doesn’t come up.
some do … with the same cable, connected to the same switch … more confusingly, booting back into the older kernel, or even booting the older CentOS 7 installer image also does not bring up the link…
lspci -v | grep Connect
5e:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
5e:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
on a happy system, with centos 7.8 / 3.10.0-1127.el7.x86_64
mlxlink -d /dev/mst/mt4121_pciconf0
Operational Info
State : Active
Physical state : LinkUp
Speed : 100GbE
Width : 4x
FEC : Standard RS-FEC - RS(528,514)
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
Enabled Link Speed : 0xf8f1f0d3 (100G,50G,40G,25G,10G,1G)
Supported Cable Speed : 0x2024a101 (100G,56G,50G,40G,25G,10G,1G)
Troubleshooting Info
Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed
Tool Information
Firmware Version : 16.32.2004
MFT Version : mft 4.22.1-11
mlxcables
Querying Cables …
Cable #1:
Cable name : mt4121_pciconf0_cable_0
No FW data to show
-------- Cable EEPROM --------
Identifier : QSFP28 (11h)
Technology : 850 nm VCSEL (00h)
Compliance : 100GBASE-SR4 or 25GBASE-SR
Wavelength : 850 nm
OUI : 0xac4afe
Vendor : DELL EMC
Serial number : CN04HG0017E4063
Part number : 14NV5
Revision : A1
Temperature [c] : 46 [-10…80]
Digital Diagnostic Monitoring : YES
Length [m] : 50 m
on another, identical system that was upgraded to centos 7.8 …
Supported Info
Enabled Link Speed : 0x0801f0d3 (40G,25G,10G,1G)
Supported Cable Speed : 0x2024a101 (100G,56G,50G,40G,25G,10G,1G)
Physical state: ETH_AN_FSM_ABILITY_DETECT,
State: Polling
Troubleshooting info:
Status Opcode: 2
Group Opcode: PHY FW
Recommendation: Negotiation failure …
same mst cable info,
both connected to a Z9264F-ON OS Version: 10.5.0.6C1
mlxconfig reset did not resolve the issue …
so far 7 systems have failed after the upgrade and I have many more left to upgrade so any tips would be very much appreciated!