Link flapping after firmware update

We have HPE DL380Gen10+ servers used as VMware ESXi hosts. This week I started update of one cluster. First 5 hosts without firmware updates, all went well. Then I included the latest HPE firmware packages which includes a firmware 26.34.1002 update for the MCX adapter in PCI slot, the firmware for OCP was already on latest HPE version 26.34.1002. Driver is latest from HPE web page, I also tried the latest from VMware.

Firmware component             Host Version    Image Version
Nvidia Network Adapter         26.33.1048      26.34.1002

Two Dual Port adapters are used, only one port of each connected to a Cisco ACI switch.

Mellanox MCX631102AS-ADAT Ethernet 10/25Gb 2-port SFP28 (PCI)
Mellanox MCX631432AS-ADAI Ethernet 10/25Gb 2-port SFP28 OCP3

OCP 3.0 Slot 10	Mellanox ConnectX-6 LX OCP3.0	A5	26.34.1002	  Enabled
PCI-E Slot 1	Mellanox Network Adapter - B8:3F:D2:2D:A7:0A		26.34.1002	  Enabled

After the update link flapping started on both hosts. I contacted VMware and HPE, as well as our network team. There is no clear response, all point to firmware update and/or the adapters. Problem is that HPE as vendor is not very helpful when it comes to issues other than completely failing hardware.

What I tried:

I’m out of ideas here.

Sometimes I see an Status Opcode 14 in mlxlink, but not always. I first thought it would only be the port of the PCI adapter that has the issue, but after hours the flapping suddenly changes to the port of the OCP adapter. There were never both ports affected at the same time!

One thing that is still a mystery to me is FEC mode. There is no way to configure it directly in ESXi. Only with mlxlink tool. I see in output that it is set to Firecode FEC, network team told me that its set on switch side to “inherit” witch is kind of auto mode. I read a while ago that it should be RS-FEC depending on SFP. But whatever I try to set with mlxlink, I don’t see an difference in mlxlink output. This can be totally unrelated but FEC mode is something that I feel nobody in normal operations really takes care of (and there is no obvious way to do in ESXi).

# /opt/mellanox/bin/mlxlink -d mt4127_pciconf0 --show_fec

Operational Info
----------------
State                           : Active
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : 25G
Width                           : 1x
FEC                             : Firecode FEC
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 0
Group Opcode                    : N/A
Recommendation                  : No issue was observed

Tool Information
----------------
Firmware Version                : 26.34.1002
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

FEC Capability Info
-------------------
FEC Capability 25G              : 0x7 (No-FEC, Firecode_FEC, RS-FEC (528,514))
FEC Capability 10G              : 0x1 (No-FEC)


# /opt/mellanox/bin/mlxlink -d mt4127_pciconf1 --show_fec

Operational Info
----------------
State                           : Physical LinkUp
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : N/A
Width                           : N/A
FEC                             : N/A
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 14
Group Opcode                    : PHY FW
Recommendation                  : Remote faults detected

Tool Information
----------------
Firmware Version                : 26.34.1002
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

FEC Capability Info
-------------------
FEC Capability 25G              : 0x7 (No-FEC, Firecode_FEC, RS-FEC (528,514))
FEC Capability 10G              : 0x1 (No-FEC)


#  esxcli network nic stats  get -n vmnic0
NIC statistics for vmnic0
   Packets received: 74462
   Packets sent: 82059
   Bytes received: 6566741
   Bytes sent: 48312792
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Multicast packets received: 16572
   Broadcast packets received: 48724
   Multicast packets sent: 239
   Broadcast packets sent: 2235
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0



# esxcli network nic stats  get -n vmnic2
NIC statistics for vmnic2
   Packets received: 0
   Packets sent: 1335
   Bytes received: 0
   Bytes sent: 160167
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Multicast packets received: 0
   Broadcast packets received: 0
   Multicast packets sent: 63
   Broadcast packets sent: 655
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 35
   Transmit aborted errors: 35
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0


...
... Before vmnic0
...
2023-02-10T18:57:19.715Z: [netCorrelator] 83502331us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
...
... Then vmnic2
...
2023-02-10T19:44:04.385Z: [netCorrelator] 35172694us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-10T19:45:02.264Z: [netCorrelator] 91778300us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-10T19:45:58.970Z: [netCorrelator] 148483739us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-10T19:46:14.772Z: [netCorrelator] 164285571us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-10T19:46:46.376Z: [netCorrelator] 195888427us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-10T19:47:02.178Z: [netCorrelator] 211690401us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-10T19:47:33.831Z: [netCorrelator] 243343442us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
...
...
2023-02-11T09:19:49.774Z: [netCorrelator] 2082595168us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-11T09:20:05.626Z: [netCorrelator] 2098446905us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-11T09:20:37.279Z: [netCorrelator] 2130099896us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-11T09:20:53.082Z: [netCorrelator] 2145901802us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-11T09:21:24.685Z: [netCorrelator] 2177504665us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-11T09:21:40.487Z: [netCorrelator] 2193306622us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2023-02-11T09:22:37.143Z: [netCorrelator] 2249961967us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down

Hello,

Please share the outputs of:
/opt/mellanox/bin/mlxlink -d mt4127_pciconf0 -e -m -c
/opt/mellanox/bin/mlxlink -d mt4127_pciconf1 -e -m -c

Thank you,
Viki

Thanks for you reply. The situation is more stable since Saturday, no more flapping. I removed the interface that had issues at that time from the VMware vSwitch config and waited ~2h. Then I added it back and no more flapping occurred.

But I still have no idea what triggered this and if it is really fixed. Network team did not find anything unusual.

Since one year we have all kind of strange issues. At this time network team deployed Cisco ACI and we started using HPE servers with 25G Mellanox adapters.

We have those flapping issues, often after reboots or other changes. I also have some servers where an interface is completely down after a reboot of server or switch. Only powering down and removing all power cables from server solves this problem (or sth like ‘mlxfwreset -d 37:00.1 -l 4 reset’, this kills the server but at least nobody has to go on-site). So we are currently not very happy with the situation but don’t know if it is more an Cisco ACI or Mellanox issue.

server #1

/opt/mellanox/bin/mlxlink -d mt4127_pciconf0 -e -m -c

Operational Info
----------------
State                           : Active
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : 25G
Width                           : 1x
FEC                             : Firecode FEC
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 0
Group Opcode                    : N/A
Recommendation                  : No issue was observed

Tool Information
----------------
Firmware Version                : 26.34.1002
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

EYE Opening Info
----------------
Physical Grade                  :  13692
Height Eye Opening [mV]         :     20
Phase  Eye Opening [psec]       :     19

Module Info
-----------
Identifier                      : SFP28/SFP+
Compliance                      : 100GBASE-SR4 or 25GBASE-SR
Cable Technology                : N/A
Cable Type                      : Optical Module (separated)
OUI                             : Other
Vendor Name                     : Amphenol
Vendor Part Number              : XP-85B1-02D-H1
Vendor Serial Number            : ACU2112748
Rev                             : 1.0
Wavelength [nm]                 : 850
Transfer Distance [m]           : 10
Attenuation (5g,7g,12g) [dB]    : N/A
FW Version                      : N/A
Digital Diagnostic Monitoring   : Yes
Power Class                     : 1.0 W max
CDR RX                          : OFF
CDR TX                          : OFF
LOS Alarm                       : N/A
Temperature [C]                 : 46 [-5..75]
Voltage [mV]                    : 3308.2 [3050..3550]
Bias Current [mA]               : 5 [1..12]
Rx Power Current [dBm]          : 0 [-12..4]
Tx Power Current [dBm]          : 0 [-10..4]
IB Cable Width                  : N/A
Memory Map Revision             : 0
Linear Direct Drive             : 0
Cable Breakout                  : N/A
SMF Length                      : N/A
MAX Power                       : 0
Cable Rx AMP                    : 0
Cable Rx Emphasis               : 0
Cable Rx Post Emphasis          : 0
Cable Tx Equalization           : 0
Wavelength Tolerance            : 0.0nm
Module State                    : N/A
DataPath state [per lane]       : N/A
Rx Output Valid [per lane]      : 0
Nominal bit rate                : 25.750Gb/s
Rx Power Type                   : Average power
Manufacturing Date              : 21_06_22
Active Set Host Compliance Code : N/A
Active Set Media Compliance Code: N/A
Error Code Response             : N/A
Module FW Fault                 : N/A
DataPath FW Fault               : N/A
Tx Fault [per lane]             : 0
Tx LOS [per lane]               : N/A
Tx CDR LOL [per lane]           : 0
Rx LOS [per lane]               : 0
Rx CDR LOL [per lane]           : 0
Tx Adaptive EQ Fault [per lane] : N/A

Physical Counters and BER Info
------------------------------
Time Since Last Clear [Min]     : 4165.7
Effective Physical Errors       : 0
Effective Physical BER          : 15E-255
Raw Physical Errors Per Lane    : 0
Raw Physical BER                : 15E-255
/opt/mellanox/bin/mlxlink -d mt4127_pciconf1 -e -m -c

Operational Info
----------------
State                           : Active
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : 25G
Width                           : 1x
FEC                             : Firecode FEC
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 0
Group Opcode                    : N/A
Recommendation                  : No issue was observed

Tool Information
----------------
Firmware Version                : 26.34.1002
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

EYE Opening Info
----------------
Physical Grade                  :  13356
Height Eye Opening [mV]         :     16
Phase  Eye Opening [psec]       :     18

Module Info
-----------
Identifier                      : SFP28/SFP+
Compliance                      : 100GBASE-SR4 or 25GBASE-SR
Cable Technology                : N/A
Cable Type                      : Optical Module (separated)
OUI                             : Other
Vendor Name                     : Amphenol
Vendor Part Number              : XP-85B1-02D-H1
Vendor Serial Number            : ACU2112734
Rev                             : 1.0
Wavelength [nm]                 : 850
Transfer Distance [m]           : 10
Attenuation (5g,7g,12g) [dB]    : N/A
FW Version                      : N/A
Digital Diagnostic Monitoring   : Yes
Power Class                     : 1.0 W max
CDR RX                          : OFF
CDR TX                          : OFF
LOS Alarm                       : N/A
Temperature [C]                 : 47 [-5..75]
Voltage [mV]                    : 3309.1 [3050..3550]
Bias Current [mA]               : 5 [1..12]
Rx Power Current [dBm]          : 0 [-12..4]
Tx Power Current [dBm]          : -2 [-10..4]
IB Cable Width                  : N/A
Memory Map Revision             : 0
Linear Direct Drive             : 0
Cable Breakout                  : N/A
SMF Length                      : N/A
MAX Power                       : 0
Cable Rx AMP                    : 0
Cable Rx Emphasis               : 0
Cable Rx Post Emphasis          : 0
Cable Tx Equalization           : 0
Wavelength Tolerance            : 0.0nm
Module State                    : N/A
DataPath state [per lane]       : N/A
Rx Output Valid [per lane]      : 0
Nominal bit rate                : 25.750Gb/s
Rx Power Type                   : Average power
Manufacturing Date              : 21_06_22
Active Set Host Compliance Code : N/A
Active Set Media Compliance Code: N/A
Error Code Response             : N/A
Module FW Fault                 : N/A
DataPath FW Fault               : N/A
Tx Fault [per lane]             : 0
Tx LOS [per lane]               : N/A
Tx CDR LOL [per lane]           : 0
Rx LOS [per lane]               : 0
Rx CDR LOL [per lane]           : 0
Tx Adaptive EQ Fault [per lane] : N/A

Physical Counters and BER Info
------------------------------
Time Since Last Clear [Min]     : 4085.1
Effective Physical Errors       : 0
Effective Physical BER          : 15E-255
Raw Physical Errors Per Lane    : 0
Raw Physical BER                : 15E-255

server #1

 /opt/mellanox/bin/mlxlink -d mt4127_pciconf0 -e -m -c

Operational Info
----------------
State                           : Active
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : 25G
Width                           : 1x
FEC                             : Firecode FEC
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 0
Group Opcode                    : N/A
Recommendation                  : No issue was observed

Tool Information
----------------
Firmware Version                : 26.34.1002
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

EYE Opening Info
----------------
Physical Grade                  :  12298
Height Eye Opening [mV]         :     18
Phase  Eye Opening [psec]       :     18

Module Info
-----------
Identifier                      : SFP28/SFP+
Compliance                      : 100GBASE-SR4 or 25GBASE-SR
Cable Technology                : N/A
Cable Type                      : Optical Module (separated)
OUI                             : Other
Vendor Name                     : Amphenol
Vendor Part Number              : XP-85B1-02D-H1
Vendor Serial Number            : ACU2112745
Rev                             : 1.0
Wavelength [nm]                 : 850
Transfer Distance [m]           : 10
Attenuation (5g,7g,12g) [dB]    : N/A
FW Version                      : N/A
Digital Diagnostic Monitoring   : Yes
Power Class                     : 1.0 W max
CDR RX                          : OFF
CDR TX                          : OFF
LOS Alarm                       : N/A
Temperature [C]                 : 46 [-5..75]
Voltage [mV]                    : 3307.7 [3050..3550]
Bias Current [mA]               : 5 [1..12]
Rx Power Current [dBm]          : 0 [-12..4]
Tx Power Current [dBm]          : -2 [-10..4]
IB Cable Width                  : N/A
Memory Map Revision             : 0
Linear Direct Drive             : 0
Cable Breakout                  : N/A
SMF Length                      : N/A
MAX Power                       : 0
Cable Rx AMP                    : 0
Cable Rx Emphasis               : 0
Cable Rx Post Emphasis          : 0
Cable Tx Equalization           : 0
Wavelength Tolerance            : 0.0nm
Module State                    : N/A
DataPath state [per lane]       : N/A
Rx Output Valid [per lane]      : 0
Nominal bit rate                : 25.750Gb/s
Rx Power Type                   : Average power
Manufacturing Date              : 21_06_22
Active Set Host Compliance Code : N/A
Active Set Media Compliance Code: N/A
Error Code Response             : N/A
Module FW Fault                 : N/A
DataPath FW Fault               : N/A
Tx Fault [per lane]             : 0
Tx LOS [per lane]               : N/A
Tx CDR LOL [per lane]           : 0
Rx LOS [per lane]               : 0
Rx CDR LOL [per lane]           : 0
Tx Adaptive EQ Fault [per lane] : N/A

Physical Counters and BER Info
------------------------------
Time Since Last Clear [Min]     : 4083.3
Effective Physical Errors       : 0
Effective Physical BER          : 15E-255
Raw Physical Errors Per Lane    : 0
Raw Physical BER                : 15E-255
/opt/mellanox/bin/mlxlink -d mt4127_pciconf1 -e -m -c

Operational Info
----------------
State                           : Active
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : 25G
Width                           : 1x
FEC                             : Firecode FEC
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 0
Group Opcode                    : N/A
Recommendation                  : No issue was observed

Tool Information
----------------
Firmware Version                : 26.34.1002
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

EYE Opening Info
----------------
Physical Grade                  :  11530
Height Eye Opening [mV]         :     14
Phase  Eye Opening [psec]       :     17

Module Info
-----------
Identifier                      : SFP28/SFP+
Compliance                      : 100GBASE-SR4 or 25GBASE-SR
Cable Technology                : N/A
Cable Type                      : Optical Module (separated)
OUI                             : Other
Vendor Name                     : Amphenol
Vendor Part Number              : XP-85B1-02D-H1
Vendor Serial Number            : ACU2112769
Rev                             : 1.0
Wavelength [nm]                 : 850
Transfer Distance [m]           : 10
Attenuation (5g,7g,12g) [dB]    : N/A
FW Version                      : N/A
Digital Diagnostic Monitoring   : Yes
Power Class                     : 1.0 W max
CDR RX                          : OFF
CDR TX                          : OFF
LOS Alarm                       : N/A
Temperature [C]                 : 47 [-5..75]
Voltage [mV]                    : 3318.4 [3050..3550]
Bias Current [mA]               : 5 [1..12]
Rx Power Current [dBm]          : 0 [-12..4]
Tx Power Current [dBm]          : 0 [-10..4]
IB Cable Width                  : N/A
Memory Map Revision             : 0
Linear Direct Drive             : 0
Cable Breakout                  : N/A
SMF Length                      : N/A
MAX Power                       : 0
Cable Rx AMP                    : 0
Cable Rx Emphasis               : 0
Cable Rx Post Emphasis          : 0
Cable Tx Equalization           : 0
Wavelength Tolerance            : 0.0nm
Module State                    : N/A
DataPath state [per lane]       : N/A
Rx Output Valid [per lane]      : 0
Nominal bit rate                : 25.750Gb/s
Rx Power Type                   : Average power
Manufacturing Date              : 21_06_22
Active Set Host Compliance Code : N/A
Active Set Media Compliance Code: N/A
Error Code Response             : N/A
Module FW Fault                 : N/A
DataPath FW Fault               : N/A
Tx Fault [per lane]             : 0
Tx LOS [per lane]               : N/A
Tx CDR LOL [per lane]           : 0
Rx LOS [per lane]               : 0
Rx CDR LOL [per lane]           : 0
Tx Adaptive EQ Fault [per lane] : N/A

Physical Counters and BER Info
------------------------------
Time Since Last Clear [Min]     : 4083.6
Effective Physical Errors       : 0
Effective Physical BER          : 15E-255
Raw Physical Errors Per Lane    : 0
Raw Physical BER                : 15E-255
# show interface ethernet 1/22
Ethernet1/22 is up
admin state is up, Dedicated Interface
  Hardware: 100/1000/10000/25000/auto Ethernet, address:xxxxxx.05b6 (bia xxxxxac.05b6)
  MTU 9000 bytes, BW 25000000 Kbit, DLY 1 usec
  reliability 254/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, medium is broadcast
  Port mode is trunk
  full-duplex, 25 Gb/s, media type is 25G
  FEC (forward-error-correction) : cl74-fc-fec

This is the port configuration on Cisco side. FEC is set to cl74-fc-fec.

We use “845398-B21 HPE 25Gb SFP28 SR 100m” Transceiver ( HPE 25G and 100G Transceivers), according to mlxlink info this is a Amphenol SFP.

Vendor Name                     : Amphenol
Vendor Part Number              : XP-85B1-02D-H1
Vendor Serial Number            : ACU2112745

I don’t see much info about FEC mode in HPE or Amphenol specs.

Hi @vikiz does my provided information help? I got some feedback from my network team that Cisco knows about similar issues with Mellanox adapters in the past.

Hi @ralf.gross1

Please be aware that the cable is not supported. Additionally, the CDR in the modules is currently disabled when it should be enabled. To investigate the issue further, more information and logs are needed. The best course of action is to initiate a support case for thorough debugging.

Thanks,
Chen

Thanks for your feedback. First time I hear about CDR, is this a setting that has to be done static on adapter/module and switch port or should this be negotiated? I don’t see an option for mlxlink command. How would I enable CDR?

Regarding opening a case, can I do this for an OEM adapter from HPE? I created a case at HPE weeks ago but received 0 feedback.

We also received feedback from Cisco, they believe the issue is related to this:

3rd Party Switches Link Is Down Due to Auto-Negotiation (nvidia.com)

Hi @ralf.gross1

I can’t confirm with certainty that this is the same issue, further debugging is necessary.
Please create a support case based on your entitlement by sending an email to EnterpriseSupport@nvidia.com and mention the card PSID.

Thanks,
Chen

Can someone please tell me where/how I can enable CDR? Our network team does not see anything regarding this on the switches and I don’t see na option in mlxlink. I see that some interfaces have ít enabled, others not. And I find 0 information about this.