SN2010M Switches - high pause & discards packets

Firstly, let me point out, i’m not a networking guy so please bare with me.

We recently purchased a HPE HCI VMWare solution consisting of three HPE ProLiant servers, a HPE Nimble storage array and a pair of HPE/Mellanox SN2010m switches. The servers each have 4 x 25gb SFP nics, the Nimble also has 4 x 25gb SFP nics and the two switches are connected/stacked/mlag’ed via 100gb sfp dac.

HPE engineers configured and installed the whole setup as part of the installation service. Each server has a bonded pair of 25GB nics (for management & lan purposes) and the other two 25GB nics are used for multi path communication with the Nimble storage (ie: iscsi-A and iscsi-B). The Nimble has two controllers, again, each with two 25gb nics (ie: iscan-A & iscsi-B). The servers and Nimble are connected to both switches for HA purposes.

On the Mellanox switches, we’re seeing high levels of paused and discard packets. This only applies to the mlag ports and the server management ports. We don’t see any paused or discard packets against any of the ports used for iscsi.

Jumbo frames is enabled on the switches, the Nimble and the servers. Flow control is set to ‘global’ for all switch ports with exception to the mlag ports, this is set to PFC.

So far we have updated the servers and storage unit with the latest patches/firmware and replaced all the DAC cables, however, the issue still remains.

We have tickets raised with HPE and VMWare but both are struggling to identify the root cause? Anyone have an suggestions or advice on what could be the issue.

Hi Phil,

Joey here from Nvidia support team, I will be assisting you with this case.

I understand that you are connecting 3 HPE servers and a HPE storage array to a pair of HPE/Mellanox SN2010m switches which are running in mlag. I would suggest you provide us a detailed diagram about the connections and all devices and tell us which ports are showing high levels of paused and discard packets, And could you collect sysdump or cl-support from both switches and upload them to the case for analysis? In addition. what’s the impact of this issue apart from high levels of paused and discard packets? thanks

Hi Joey,

We have three HPE servers (VMWare/ESXi) which each have four 25gb SFP nics and two HPE servers (Windows) that have a single 10gb SFP nic. The HPE Storage array has four 25gb SFP nics. In addition to this, we also have a HP 2530-48G-PoEP Switch (J9772A) connected via 1gb. The SN2010M mlag’d switches is our core switch stack and only server hardware is connected to it. The HP 2530-48G is our edge switch and only client devices (workstations, phones, printers, etc) are connected to it. Hopefully, the table below gives a clear idea of how everything is connected.

Device Switch Number Switch Port Transceiver SKU SKU Descriptions
ProLiant #1 (ESXi) Mgmt Port #1 SN2010M #1 Port #1 487655-B21 HPE BLc 10G SFP+ SFP+ 3m DAC Cable
ProLiant #1 (ESXi) Mgmt Port #2 SN2010M #2 Port #1 487655-B21 HPE BLc 10G SFP+ SFP+ 3m DAC Cable
ProLiant #1 (ESXi) iSCSI Port #1 SN2010M #1 Port #2 844477-B21 HPE 25Gb SFP28 to SFP28 3m DAC
ProLiant #1 (ESXi) iSCSI Port #2 SN2010M #2 Port #2 844477-B21 HPE 25Gb SFP28 to SFP28 3m DAC
ProLiant #2 (ESXi) Mgmt Port #1 SN2010M #1 Port #3 487655-B21 HPE BLc 10G SFP+ SFP+ 3m DAC Cable
ProLiant #2 (ESXi) Mgmt Port #2 SN2010M #2 Port #3 487655-B21 HPE BLc 10G SFP+ SFP+ 3m DAC Cable
ProLiant #2 (ESXi) iSCSI Port #1 SN2010M #1 Port #4 844477-B21 HPE 25Gb SFP28 to SFP28 3m DAC
ProLiant #2 (ESXi) iSCSI Port #2 SN2010M #2 Port #4 844477-B21 HPE 25Gb SFP28 to SFP28 3m DAC
ProLiant #3 (ESXi) Mgmt Port #1 SN2010M #1 Port #5 487655-B21 HPE BLc 10G SFP+ SFP+ 3m DAC Cable
ProLiant #3 (ESXi) Mgmt Port #2 SN2010M #2 Port #5 487655-B21 HPE BLc 10G SFP+ SFP+ 3m DAC Cable
ProLiant #3 (ESXi) iSCSI Port #1 SN2010M #1 Port #6 844477-B21 HPE 25Gb SFP28 to SFP28 3m DAC
ProLiant #3 (ESXi) iSCSI Port #2 SN2010M #2 Port #6 844477-B21 HPE 25Gb SFP28 to SFP28 3m DAC
Nimble Ctrl A iSCSI port1 SN2010M #1 Port #7 R0R42A HPE 25Gb SFP28 SR 30m Transceiver
Nimble Ctrl B iSCSI port1 SN2010M #1 Port #8 R0R42A HPE 25Gb SFP28 SR 30m Transceiver
Nimble Ctrl A iSCSI port2 SN2010M #2 Port #7 R0R42A HPE 25Gb SFP28 SR 30m Transceiver
Nimble Ctrl B iSCSI port2 SN2010M #2 Port #8 R0R42A HPE 25Gb SFP28 SR 30m Transceiver
ProLiant #4 (Windows) Port #1 SN2010M #1 Port #13 10119555-3030LF HPE 10GB SFP+ 3m DAC
ProLiant #5 (Windows) Port #1 SN2010M #1 Port #16 10119555-3030LF HPE 10GB SFP+ 3m DAC
HP 2530-48G Switch SN2010M #1 Port #18 JD089A HPE 1000BASE-T SFP
HP 2530-48G Switch SN2010M #2 Port #18 JD089A HPE 1000BASE-T SFP
Stacking port #1 SN2010M #1 Port #21 JL271A HPE X240 100G QSFP28 to QSFP28 1m DAC
Stacking port #2 SN2010M #1 Port #22 JL271A HPE X240 100G QSFP28 to QSFP28 1m DAC
Stacking port #1 SN2010M #2 Port #21 JL271A HPE X240 100G QSFP28 to QSFP28 1m DAC
Stacking port #2 SN2010M #2 Port #22 JL271A HPE X240 100G QSFP28 to QSFP28 1m DAC

The ports showing heavy pause & discard packets are:

Switch #1: 1, 3, 5, 13, 16, 21 & 22
Switch #1: 1, 3, 5, 21 & 22

Hi Phil,

Thanks for the detailed information and sysdump of one switch. I looked into the stats of all ports and here is a summary:
1/1, 1/3, 1/5 have discard packets for Rx and only have pause frames sent out, no received;
1/13, 1/16, 1/21, 1/22 don’t have discard packets for Rx, have pause frames sent out and received as well.

After looking into the config, I think you are using PFC with priority 4 for traffic of ports 1/2, 1/4, 1/6, 1/7, and 1/8;
and also using flow-control (global pause) with default priority 0 for traffic of ports 1/1, 1/3, 1/5, 1/13, 1/16. Since we can see that the switch is sending pause frames out via 1/1, 1/3, and 1/5 but still showing discard for Rx, I would suggest you check the ProLiant #1/2/3 (ESXi) Mgmt Ports to see if it’s flow-control enabled. The ProLiant servers side should stop sending traffic to the switch after receiving pause frames, and I can see that HP 2530 switch only has two 1G ports connected to SN2010 switches, which may cause congestion when the servers send high volume of traffic to any devices under HP 2530 switch.

In addition, I also found in sysdump that there are some wjh packets showing the switch received some traffic from HP 2530 switch via port mpo25/Eth1/18 on 11/15 and 11/16, but the traffic should be forwarded out via this port, so it is not expected. Can you please provide us the output of the following commands on 2 mlag switches?

show stp

show mlag

show mlag-vip

Thanks Joey, that was very useful. I took a closer look at the ProLiant nics and found flow control (both tx & rx) was disabled on 2 out of the 4 nics on each ProLiant. Not sure why/how as i understand flow control should be enabled by default in VMWare ESXi7? Flow Control is now enabled on all ProLiant nics and i’ve cleared the switch counters so we can monitor over the next few days.

I will send you the mlag switch command results via PM now.

Hi Phil,

Sorry for the wrong command, please use this one:

show spanning-tree

I looked at the output of mlag, it looks good to me. Please monitor the stats of concerned ports and see how it works.

Ok thanks. I’ll pm you the spanning-tree results now.
After 24 hours, we’re still seeing high levels of rx discards on the three VMWare Proliant mgmt ports and high tx pause packets on the mlag ports and the Windows Proliant server.

Hi Phil,

I think it’s reasonable to see TX pause on the ports when they receive high volume of traffic and can’t forward it out to other ports. Here we have two 1G ports connected to HP 2530 switch, so there will be congestion when the traffic flow originates from servers connected via 10G ports to any device under HP 2530 switch. The SN2010 switches will send pause frames to these servers in such scenario asking the servers to hold the traffic, and the servers should react to these pause frames by reducing the traffic rate until it stops receiving the pause frames.
In this way the SN2010 switches is supposed to not see any RX discard packets. So I would suggest you check the flow control config on the server side. thanks.

Hi Phil,

Just want to follow up if you are able to check the flow control config on the server side. thanks.

Hi Joey

Since enable flow control on all esxi ports, we’re seeing much less pause/discard packets. Thanks for all your help.