LAG problems

jpkspam · September 12, 2017, 9:18am

My team is currently standing up a new cluster that has an SN2700 core ethernet switch on our boot network. LAG links are working fine between this core and the leaf switches in the new cluster. We also have an older cluster with an SX1036 ethernet switch serving as its core switch. LAG links are also working fine between this older core switch and the older leaf switches in that cluster. Several of us have tried to get LAG working between the SX1036 and SN2700 and we can’t working link (single link works fine). We’ve done typical troubleshooting looking for bad cables/ports etc. We can find no differences comparing the configurations and status for working LAG links and the failing link.

The SX1036 is a PPC switch and is running a much older firmware:

Product name: MLNX-OS

Product release: 3.4.3002

Build ID: #1-dev

Build date: 2015-07-30 20:13:15

Target arch: ppc

Target hw: m460ex

Built by: jenkins@fit74

Version summary: PPC_M460EX 3.4.3002 2015-07-30 20:13:15 ppc

Product model: ppc

than the SN2700 (X86):

Product name: MLNX-OS

Product release: 3.6.3200

Build ID: #1-dev

Build date: 2017-03-09 17:55:58

Target arch: x86_64

Target hw: x86_64

Built by: jenkins@e3f42965d5ee

Version summary: X86_64 3.6.3200 2017-03-09 17:55:58 x86_64

Product model: x86onie

The obvious thing to try is updating the firmware on the SX1036, but this cluster is in production and our team is nervous about messing with that core switch as it’s pretty critical to our infrastructure. Would a firmware mismatch cause this behavior.

I have seen documentation indicating that MLAG doesn’t work between PPC and X86 switches. I sure hope that’s not the case for LAG…

khwaja · September 29, 2017, 1:20pm

Hi Rick,

LAG should work fine b/w SX1036 and the SN2700 switch. Only for MLAG we have the limitations of the cpu which should match for both the switches.

Can you please verify your configs?

Is this a regular LACP port channel b/w both the switches.

What is the status of the second port which you are bundling in a LACP? Is it up/down/suspended?

Please share me the details.

Thanks

Khwaja

jpkspam · October 6, 2017, 7:19pm

We’re not using LACP. We actually got it working by changing the port mode to “hybrid” instead of “trunk”. All of our other LAG links work fine in trunk mode. We figure there’s a misconfiguration somewhere in our system causing this, but we have a bunch of switches running at this point. The hybrid workaround has bumped this pretty low on the priority queue, particularly as any debugging would likely bring down a link critical to production work. But I’m all ears if somebody has an idea why we have this issue. Thanks.

Topic		Replies	Views
MLAG - proper active/active iSCSI cabeling Ethernet Switches mlag	2	936	January 27, 2021
Migrating from SX1012 to SN2100, can MLAG run on different model? Ethernet Switches	4	693	October 2, 2023
MLAG LACP-rate mismatch for Linux host Switches and Gateways	5	1502	June 21, 2021
SN2100 MLAG link down and stuck in passive mode Switches and Gateways	3	1069	July 7, 2020
SN2100 MLAG Problem Ethernet Switches	2	1117	December 21, 2022
How to configure a redundant Ethernet switch setup with 2* 6036G and 2*Cisco SG500X Switches and Gateways ethernet , switches , configure	2	1246	December 30, 2021
Question about stacking Ethernet Switches	2	541	August 12, 2016
How to do L2 East-West Traffic with IPL and MLAG? Switches and Gateways	3	811	January 4, 2022
Configuring Cisco 6513 switch and melanox MLAG Ethernet Switches	1	954	August 1, 2017
LAG, Lacp configuration on Mellanox switches solutions	2	1274	January 29, 2015

LAG problems

Related topics