I updated several Dell C6420 clients with a Mellanox Technologies MT28800 Family [ConnectX-5 Ex] card from centos 7.4.1708 to centos 7.9.2009 and a lot of them no longer negotiate 100GbE connections and the link doesn’t come up.
some do .. with the same cable, connected to the same switch .. more confusingly, booting back into the older kernel, or even booting the older CentOS 7 installer image also does not bring up the link..
lspci -v | grep Connect
5e:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
5e:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
on a happy system, with centos 7.8 / 3.10.0-1127.el7.x86_64
mlxlink -d /dev/mst/mt4121_pciconf0
Operational Info
State : Active
Physical state : LinkUp
Speed : 100GbE
Width : 4x
FEC : Standard RS-FEC - RS(528,514)
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
Enabled Link Speed : 0xf8f1f0d3 (100G,50G,40G,25G,10G,1G)
Supported Cable Speed : 0x2024a101 (100G,56G,50G,40G,25G,10G,1G)
Troubleshooting Info
Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed
Tool Information
Firmware Version : 16.32.2004
MFT Version : mft 4.22.1-11
mlxcables
Querying Cables …
Cable #1:
Cable name : mt4121_pciconf0_cable_0
No FW data to show
-------- Cable EEPROM --------
Identifier : QSFP28 (11h)
Technology : 850 nm VCSEL (00h)
Compliance : 100GBASE-SR4 or 25GBASE-SR
Wavelength : 850 nm
OUI : 0xac4afe
Vendor : DELL EMC
Serial number : CN04HG0017E4063
Part number : 14NV5
Revision : A1
Temperature [c] : 46 [-10..80]
Digital Diagnostic Monitoring : YES
Length [m] : 50 m
on another, identical system that was upgraded to centos 7.8 ..
Supported Info
Enabled Link Speed : 0x0801f0d3 (40G,25G,10G,1G)
Supported Cable Speed : 0x2024a101 (100G,56G,50G,40G,25G,10G,1G)
Physical state: ETH_AN_FSM_ABILITY_DETECT,
State: Polling
Troubleshooting info:
Status Opcode: 2
Group Opcode: PHY FW
Recommendation: Negotiation failure ..
same mst cable info,
both connected to a Z9264F-ON OS Version: 10.5.0.6C1
mlxconfig reset did not resolve the issue ..
so far 7 systems have failed after the upgrade and I have many more left to upgrade so any tips would be very much appreciated!