MLAG LACP-rate mismatch for Linux host

Hello Community,

i have 2 SN2700 in MLAG and a CentOS8 host connected using a mlag-port-channel of 2 ConnectX5 Nics, using one port of each of them.

NICs are in bond mode (LACP 4) on host end.

Connectivity seems healthy, interfaces are all up, LACP works, pings indicate no packet loss.

But: there is a substantial LACP rate mismatch:

Mlag-port-channel 7:


LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/7 0 0 0 0 103482 3993 0 0

SAN array connected in the exact same manner has no mismatch.

Tested all combinations for LACP rate “0” and “1”, according to “https://community.mellanox.com/s/article/troubleshoot-lag-mlag-lacp-pdu-rate-issues”, but none of them seems to change that.

All that might play into it is the fact that “spanning-tree bpdufilter enable” is configured on the switch ethernet ports?

LACP mismatch is widely considered a misconfguration, and since this is an iSCSI network, i’d like to have that as clean as possible…

Could anyone kindly give me a tip?

Hi Frank,

Thank you for reaching out to NVIDIA support.

Based on the LACP counters output for Mpo7, we can see that the switch is sending out LACPUD’s at a fast rate as compared to its remote partner. This means that the host is configured for LACP rate fast ( bond-lacp-rate 1)

The default LACP rate in Onyx is configured as slow (bond-lacp-rate 0) which sets the rate to ask the link partner to transmit LACP control packets every 30 seconds. You can check the lacp rate by using following command

switch(config)# show lacp interfaces ethernet 1/7

Can you please perform the following -

  • Please configure the host for lacp-rate 0 to match the switch default LACP rate.

  • clear the lacp counters from both switches using following command

switch(config)# clear counters interface mlag-port-channel 7

  • Monitor the lacp counters on both switches.

switch(config)# show lacp counters

Let me know if any questions.

Thanks,

Pratik Pande

Hi Pratik,

thank you very much for offering help!

I actually did that already in the runup to this post:

I tested to have both ends at slow and at fast pace. but the mismatch seems to persist.

The iSCSI array connected to the MLAG does not exhibit this issue, just the LINUX server (CentOS8.3)

Then doublechecked with “https://community.mellanox.com/s/article/troubleshoot-lag-mlag-lacp-pdu-rate-issues”.

Now after having followed your reconmended steps, the situation looks like this:

###########################

switch1 [mlag-vip-domain1: master] # clear counters interface mlag-port-channel 7

Mlag-port-channel 7:


LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/7 0 0 0 0 253705 148486 0 0

###########################

Edit: realised later that the mismatch does not apply for the other member conection:

###########################

switch2 [mlag-vip-domain1: standby] # clear counters interface mlag-port-channel 7

switch2 [mlag-vip-domain1: standby] # show lacp counters

Mlag-port-channel 7:


LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/8 0 0 0 0 73607 70784 0 0

###########################

###########################

switch1 [mlag-vip-domain1: master] # show lacp interfaces ethernet 1/7

Port: 1/7

Port State: Bundle

MLAG Channel Group: 7

Pseudo mlag-port-channel: Mpo7

LACP port-priority: 32768

LACP Rate: Fast

LACP Activity: Active

LACP Timeout: Short

Aggregation State: Aggregation, Sync, Collecting, Distributing,


LACP Port Admin Oper Port Port

Port State Priority Key Key Number State


1/7 Bundle 32768 29007 29007 0x7 0x3f

switch2 [mlag-vip-domain1: standby] # show lacp interfaces ethernet 1/8

Port: 1/8

Port State: Bundle

MLAG Channel Group: 7

Pseudo mlag-port-channel: Mpo7

LACP port-priority: 32768

LACP Rate: Fast

LACP Activity: Active

LACP Timeout: Short

Aggregation State: Aggregation, Sync, Collecting, Distributing,


LACP Port Admin Oper Port Port

Port State Priority Key Key Number State


1/8 Bundle 32768 29007 29007 0x8 0x3f

###########################

The port numbers differ because cabeling is symmetric, but host-ports of the port-channel are on different NICs.

###########################

switch1 [mlag-vip-domain1: master] # show running-config interface mlag-port-channel 7

interface mlag-port-channel 7

interface mlag-port-channel 7 mtu 7936 force

interface mlag-port-channel 7 no shutdown

interface mlag-port-channel 7 switchport access vlan 20

interface mlag-port-channel 7 spanning-tree bpdufilter enable

interface mlag-port-channel 7 spanning-tree port type edge

interface mlag-port-channel 7 dcb priority-flow-control mode on force

switch2 [mlag-vip-domain1: standby] # show running-config interface mlag-port-channel 7

interface mlag-port-channel 7

interface mlag-port-channel 7 mtu 7936 force

interface mlag-port-channel 7 no shutdown

interface mlag-port-channel 7 switchport access vlan 20

interface mlag-port-channel 7 spanning-tree bpdufilter enable

interface mlag-port-channel 7 spanning-tree port type edge

interface mlag-port-channel 7 dcb priority-flow-control mode on force

###########################

###########################

switch1 [mlag-vip-domain1: master] # show running-config interface eth 1/7

interface ethernet 1/7 speed 40G force

interface ethernet 1/7 mtu 7936 force

interface ethernet 1/7 mlag-channel-group 7 mode active

interface ethernet 1/7 lacp rate fast

switch2 [mlag-vip-domain1: standby] # show running-config interface eth 1/8

interface ethernet 1/8 speed 40G force

interface ethernet 1/8 mtu 7936 force

interface ethernet 1/8 mlag-channel-group 7 mode active

interface ethernet 1/8 lacp rate fast

###########################

###########################

ifcfg-bond0:

BONDING_OPTS=“mode=4 miimon=100 lacp_rate=1”

###########################

The only straw i am currently clutching at is that after creating the bond0, i just had opportunity to restart the entire network stack, which usually does the job (networking on L2/L3 works fine).

I could not reboot the entire server, since it is in production use…

Do you possibly have an idea?

Thank you,

Hilmar

Hilmar,

since we have no idea if changing the lacp reate on the server side does anything (not under Nvidia responsibility) - try to change the lacp rate on the switch to fast

https://docs.mellanox.com/pages/viewpage.action?pageId=49160752#LinkAggregationGroup(LAG)-lacp(interface)

On both switches:

(config)# interface ether et 1/7 lacp rate fast

(config)# interface ether et 1/8 lacp rate fast

you can check using tcpdump on the host side for the lacp packets (and thus it’s interval) with:

Linux server

tcpdump -i ether proto 0x8809

Hello Eddie,

thank you very much for sticking with it, i really appreciate it!

And sorry to be so late, i was required to take care about another problem…

Actually i already tried all sensible combinations in the hope one of the both states or even just switching them would change the behaviour,

but unfortunately, it didn’t.

###########################

ifcfg-bond0:

DEVICE=“bond0”

[ … ]

fast rate:

BONDING_OPTS=“mode=4 miimon=100 lacp_rate=1”

slow rate:

#BONDING_OPTS=“mode=4 miimon=100 lacp_rate=0”

#BONDING_OPTS=“mode=802.3ad miimon=100 lacp_rate=fast xmit_hash_policy=layer2+3”

###########################

###########################

switch1 [mlag-vip-domain1: master] # show lacp counters

Mlag-port-channel 4:

LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/4 0 0 0 0 14298 14304 0 0

Mlag-port-channel 7:

1/7 0 0 0 0 512988 397901 0 0

###########################

I then ran the tcpdump you recommended, and it confirmed that the update happens every other second:

###########################

[root@troubled-host $

tcpdump -i enp65s0f0 ether proto 0x8809

dropped privs to tcpdump

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on enp65s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes

11:06:48.208294 LACPv1, length 110

11:06:49.214371 LACPv1, length 110

[ … ]

11:07:03.216880 LACPv1, length 110

11:07:03.684935 LACPv1, length 110

11:07:04.222151 LACPv1, length 110

^C

32 packets captured

32 packets received by filter

0 packets dropped by kernel

###########################

Because it looked like the mismatch narrows in over a prolonged time,

i attempted to reset counters and compare to a short term count.

But this time the reset of the counters for the specific interface simply did not work.

Only a global reset worked.

At the beginning it looked like the mismatch was resolved,

but it eventually came back mounting up actually over time:

After a couple of minutes:

###########################

switch1 [mlag-vip-domain1: master] # show lacp counters

[ … ]

Mlag-port-channel 3:


LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/3 0 0 0 0 12 12 0 0

Mlag-port-channel 4:


LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/4 0 0 0 0 13 13 0 0

Mlag-port-channel 7:


LACPDUs Marker Marker Marker Rsp Marker Rsp LACPDUs LACPDUs Illegal Unknown

Port Sent Recv Sent Recv Sent Recv


1/7 0 0 0 0 379 365 0 0

###########################

This is with LACP rate = high at both ends.

I have to split my post here technically in order to fit into the editor…

Hello Eddie,

in continuation of my last post just something i wanted to add:

I compared both interfaces again,

one of the troubled portchannel 7 and one of the healthy portchannel 4,

and i can’t see any difference, except for that the interface 4 is still in default “slow”,

and on the SAN side this will possible be as well.

On the SAN end i just have to guess unfortunately, since there is no CLI interface available,

and no explicite GUI item to control the LACP rate either.

Currently configured with no adjustment applied so far in the meantime:

###########################

switch1 [mlag-vip-domain1: master] # show interfaces mlag-port-channel summary

MLAG Port-Channel Summary:


Group Type Local Peer

Port-Channel Ports Ports

(D/U/P/S) (D/P/S/I) (D/P/S/I)


1 Mpo3(U) LACP Eth1/3(P) Eth1/3(P)

2 Mpo4(U) LACP Eth1/4(P) Eth1/4(P)

3 Mpo7(U) LACP Eth1/7(P) Eth1/8(P)

[ … ]

###########################

###########################

switch1 [mlag-vip-domain1: master] # show running-config interface mlag-port-channel 7

interface mlag-port-channel 7

interface mlag-port-channel 7 mtu 7936 force

interface mlag-port-channel 7 no shutdown

interface mlag-port-channel 7 switchport access vlan 211

interface mlag-port-channel 7 spanning-tree bpdufilter enable

interface mlag-port-channel 7 spanning-tree port type edge

interface mlag-port-channel 7 dcb priority-flow-control mode on force

switch1 [mlag-vip-domain1: master] #

###########################

###########################

switch1 [mlag-vip-domain1: master] # show running-config interface eth 1/7

interface ethernet 1/7 speed 40G force

interface ethernet 1/7 mtu 7936 force

interface ethernet 1/7 mlag-channel-group 7 mode active

interface ethernet 1/7 lacp rate fast

switch1 [mlag-vip-domain1: master] #

switch1 [mlag-vip-domain1: master] # show interface eth 1/7

Eth1/7:

Admin state : Enabled

Operational state : Up

Last change in operational status: 5d and 23:47:27 ago (11 oper change)

Boot delay time : 0 sec

Mac address : b8:59:9f:7d:29:70

MTU : 7936 bytes (Maximum packet size 7958 bytes)

Fec : auto

Operational Fec : no-fec

Flow-control : receive off send off

Supported speeds : 1G 10G 25G 40G 50G 56G 100G

Advertised speeds : 40G

Actual speed : 40G

Auto-negotiation : Enabled

Width reduction mode : Unknown

Switchport mode : access

MAC learning mode : Enabled

Forwarding mode : inherited cut-through

Telemetry sampling: Disabled TCs: N/A

Telemetry threshold: Disabled TCs: N/A

Telemetry threshold level: N/A

Last clearing of “show interface” counters: Never

60 seconds ingress rate : 968 bits/sec, 121 bytes/sec, 1 packets/sec

60 seconds egress rate : 1128 bits/sec, 141 bytes/sec, 2 packets/sec

Rx:

247550 packets

0 unicast packets

247550 multicast packets

0 broadcast packets

31682620 bytes

0 discard packets

0 error packets

0 fcs errors

0 undersize packets

0 oversize packets

0 pause packets

0 unknown control opcode

0 symbol errors

0 discard packets by storm control

Tx:

273634 packets

1179 unicast packets

265847 multicast packets

6608 broadcast packets

35007140 bytes

0 discard packets

0 error packets

0 hoq discard packets

switch1 [mlag-vip-domain1: master] #

###########################

From other dont-mismatched mlag-portchannel 4:

###########################

switch1 [mlag-vip-domain1: master] # show running-config interface eth 1/4

interface ethernet 1/4 speed 40G force

interface ethernet 1/4 mtu 7936 force

interface ethernet 1/4 mlag-channel-group 4 mode active

switch1 [mlag-vip-domain1: master] #

switch1 [mlag-vip-domain1: master] # show interface eth 1/4

Eth1/4:

Admin state : Enabled

Operational state : Up

Last change in operational status: 4d and 23:26:52 ago (7 oper change)

Boot delay time : 0 sec

Mac address : b8:59:9f:7d:29:7a

MTU : 7936 bytes (Maximum packet size 7958 bytes)

Fec : auto

Operational Fec : no-fec

Flow-control : receive off send off

Supported speeds : 1G 10G 25G 40G 50G 56G 100G

Advertised speeds : 40G

Actual speed : 40G

Auto-negotiation : Enabled

Width reduction mode : Unknown

Switchport mode : access

MAC learning mode : Enabled

Forwarding mode : inherited cut-through

Telemetry sampling: Disabled TCs: N/A

Telemetry threshold: Disabled TCs: N/A

Telemetry threshold level: N/A

Last clearing of “show interface” counters: Never

60 seconds ingress rate : 40 bits/sec, 5 bytes/sec, 1 packets/sec

60 seconds egress rate : 88 bits/sec, 11 bytes/sec, 1 packets/sec

Rx:

22128 packets

12 unicast packets

14315 multicast packets

7801 broadcast packets

2332694 bytes

0 discard packets

0 error packets

0 fcs errors

0 undersize packets

0 oversize packets

0 pause packets

0 unknown control opcode

0 symbol errors

0 discard packets by storm control

Tx:

32428 packets

1105 unicast packets

28789 multicast packets

2534 broadcast packets

4782627 bytes

0 discard packets

0 error packets

0 hoq discard packets

switch1 [mlag-vip-domain1: master] #

###########################

In the meantime i had opportunity to reboot the LINUX server, but that did not change the issue…

At this point i wonder, where the “delta” frames actually remain?

Since both interfaces don’t report any error or discarded packages.

Or is it possible that different metrics are at play?

Thank you again for your patience,

Best

Hilmar