I’m relatively new to Nvidia Cumulus, but I have extensive experience with Onyx and other platforms.
I’m observing some strange reboot behavior in an MLAG cluster consisting of two HPE SN2010M switches running Cumulus 5.8.
The setup is straightforward, simple layer 2 network:
- 2 x HPE SN2010M switches
- 2 x 100G peerlinks between the switches
- Each of the 3 ESXi hosts is connected via 2 ports per switch using a standard vSwitch (no LACP or bonding, as bonding isn’t feasible due to iSCSI traffic also traversing these interfaces)
- Both switches have an MLAG LACP uplink to a third switch (bond25)
The reboot behavior is as follows:
- Under normal operation, everything works perfectly
- After executing a reboot, everything still looks fine
- Once the rebooted switch comes back online, the host ports come up and the ESXi hosts start forwarding traffic to those ports (as expected)
- However, it seems that the traffic is black-holed at this point, possibly because the peerlink is not yet fully operational
- After approximately 45–60 seconds, everything starts working perfectly again.
nv show mlag consistency-checker → no inconsistency
i get this warnings see log files “Conflict ignored: Native Vlan mismatch on peerlink between clag peers” → is this normal ?? i have double checked the config both switches have an identical peerlink config.
the behavior is reproducible every time.
i have already reduces the mlag initDelay to 10 seconds which reduced the failure time from 3 Mintues to the 45–60 seconds now
here is the clagd.log of the rebooted switch:
2025-03-30T08:54:02.110764+00:00 cumulus clagd[6835]: Beginning execution of clagd version 1.4.0
2025-03-30T08:54:02.111374+00:00 cumulus clagd[6835]: Invoked with: /usr/sbin/clagd --daemon linklocal peerlink.4094 44:38:39:BE:EF:AA --priority 32768 --backupIp 192.168.100.222 --backupVrf mgmt --initDelay 1
2025-03-30T08:54:02.592214+00:00 cumulus clagd[6835]: macAddr = 44:38:39:be:ef:aa
2025-03-30T08:54:06.385155+00:00 cumulus clagd[6835]: Allowing duplicate LACP partner MACs
2025-03-30T08:54:06.385483+00:00 cumulus clagd[6835]: Role is now secondary
2025-03-30T08:54:07.389071+00:00 cumulus clagd[6835]: Thread to receive from CSU Manager – Started
2025-03-30T08:54:07.392647+00:00 cumulus clagd[6835]: CSU Cold Boot
2025-03-30T08:54:07.395248+00:00 cumulus clagd[6835]: UP message received
2025-03-30T08:54:07.395427+00:00 cumulus clagd[6835]: Network layer info message received
2025-03-30T08:54:07.432244+00:00 cumulus clagd[6835]: [7685]Init NetlinkThreadT
2025-03-30T08:54:07.434357+00:00 cumulus clagd[6835]: [7686]Init RefreshMacsT
2025-03-30T08:54:07.435810+00:00 cumulus clagd[6835]: [7688]Init UpdateMacsInHwT
2025-03-30T08:54:07.437173+00:00 cumulus clagd[6835]: [7689]Init RemoteDeleteT
2025-03-30T08:54:07.438582+00:00 cumulus clagd[6835]: [7690]Init FdbKernelMsgT
2025-03-30T08:54:07.440031+00:00 cumulus clagd[6835]: [7691]Init UpdateMcastsInHw
2025-03-30T08:54:07.442391+00:00 cumulus clagd[6835]: [7694]Init DelayNeighNotifThreadT
2025-03-30T08:54:07.443719+00:00 cumulus clagd[6835]: Ignoring RTM_NEWNEIGH and RTM_DELNEIGH notifications
2025-03-30T08:54:07.446711+00:00 cumulus clagd[6835]: [7695]Init UpdateFromKernelT
2025-03-30T08:54:07.449192+00:00 cumulus clagd[6835]: [7696]Init UpdateToKernelT
2025-03-30T08:54:07.451294+00:00 cumulus clagd[6835]: [7697]Init SyncDelayT
2025-03-30T08:54:07.452739+00:00 cumulus clagd[6835]: [7698]Init CompareVlanMapIntfCCT
2025-03-30T08:54:07.454024+00:00 cumulus clagd[6835]: [7699]Init UpdateLacpFromPeerT
2025-03-30T08:54:07.456034+00:00 cumulus clagd[6835]: [7700]Init UpdateLacpConfigT
2025-03-30T08:54:07.462540+00:00 cumulus clagd[6835]: [7704]Init UpdateVxLanFromPeerT
2025-03-30T08:54:07.463900+00:00 cumulus clagd[6835]: [7705]Init UpdateVxLanConfigT
2025-03-30T08:54:07.469581+00:00 cumulus clagd[6835]: [7706]Init HelloReloadT
2025-03-30T08:54:07.470987+00:00 cumulus clagd[6835]: [7707]Init InitDelayT
2025-03-30T08:54:07.474444+00:00 cumulus clagd[6835]: [7708]Init HelloTxT
2025-03-30T08:54:07.478790+00:00 cumulus clagd[6835]: [7709]Init HelloRxT
2025-03-30T08:54:07.527801+00:00 cumulus clagd[6835]: [7710]Init CmdRecvT
2025-03-30T08:54:07.573379+00:00 cumulus clagd[6835]: HealthCheck: role via backup is secondary
2025-03-30T08:54:07.580087+00:00 cumulus clagd[6835]: HealthCheck: backup active
2025-03-30T08:54:07.600098+00:00 cumulus clagd[6835]: Conflict ignored: Peerlink vlans mismatch between clag peers
2025-03-30T08:54:07.600364+00:00 cumulus clagd[6835]: Conflict ignored: Native Vlan mismatch on peerlink between clag peers
2025-03-30T08:54:07.715132+00:00 cumulus clagd[6835]: Conflict ignored: Peerlink vlans mismatch between clag peers
2025-03-30T08:54:07.732912+00:00 cumulus clagd[6835]: Conflict ignored: Native Vlan mismatch on peerlink between clag peers
2025-03-30T08:54:07.733290+00:00 cumulus clagd[6835]: Conflict ignored: Peerlink vlans mismatch between clag peers
2025-03-30T08:54:07.733495+00:00 cumulus clagd[6835]: Conflict ignored: Native Vlan mismatch on peerlink between clag peers
2025-03-30T08:54:07.754753+00:00 cumulus clagd[6835]: Conflict ignored: Peerlink vlans mismatch between clag peers
2025-03-30T08:54:07.754970+00:00 cumulus clagd[6835]: Conflict ignored: Native Vlan mismatch on peerlink between clag peers
2025-03-30T08:54:07.797409+00:00 cumulus clagd[6835]: Conflict ignored: Peerlink vlans mismatch between clag peers
2025-03-30T08:54:07.797817+00:00 cumulus clagd[6835]: Conflict ignored: Native Vlan mismatch on peerlink between clag peers
2025-03-30T08:54:07.939101+00:00 cumulus clagd[6835]: Listening RTM_NEWNEIGH and RTM_DELNEIGH notifications
2025-03-30T08:54:08.018596+00:00 cumulus clagd[6835]: Using fe80::fe6a:1cff:fed4:e8d0 as remote Peer IP
2025-03-30T08:54:08.476260+00:00 cumulus clagd[6835]: HealthCheck: Delayed bring up of clag bonds for 1 seconds
2025-03-30T08:54:08.594790+00:00 cumulus clagd[6835]: (bond25): Setting ad_actor_system to 44:38:39:be:ef:aa
2025-03-30T08:54:08.612276+00:00 cumulus clagd[6835]: Conflict ignored (bond25): Native vlan mismatch
2025-03-30T08:54:08.613736+00:00 cumulus clagd[6835]: Initial config loaded
2025-03-30T08:54:08.614432+00:00 cumulus clagd[6835]: Conflict ignored (bond25): Native vlan mismatch
2025-03-30T08:54:08.615991+00:00 cumulus clagd[6835]: [7750]Init PeerSendT
2025-03-30T08:54:08.623588+00:00 cumulus clagd[6835]: [7751]Init PeerRecvT
2025-03-30T08:54:08.629344+00:00 cumulus clagd[6835]: [7752]Init PeerlinkChangeT
2025-03-30T08:54:08.641008+00:00 cumulus clagd[6835]: PeerLinkChange delayed updates
2025-03-30T08:54:08.648473+00:00 cumulus clagd[6835]: Conflict ignored (bond25): Vlans mismatch on the peerlink
2025-03-30T08:54:08.651692+00:00 cumulus clagd[6835]: Conflict ignored: Peerlink vlans mismatch between clag peers
2025-03-30T08:54:08.652388+00:00 cumulus clagd[6835]: Conflict ignored: Native Vlan mismatch on peerlink between clag peers
2025-03-30T08:54:08.653768+00:00 cumulus clagd[6835]: Conflict ignored (bond25): Native vlan mismatch
2025-03-30T08:54:09.418490+00:00 cumulus clagd[6835]: The peer switch is active.
2025-03-30T08:54:09.419812+00:00 cumulus clagd[6835]: Conflict ignored (bond25): Native vlan mismatch
2025-03-30T08:54:09.420236+00:00 cumulus clagd[6835]: Ignoring RTM_NEWNEIGH and RTM_DELNEIGH notifications
2025-03-30T08:54:09.421482+00:00 cumulus clagd[6835]: Using current peerlink role: secondary
2025-03-30T08:54:09.423568+00:00 cumulus clagd[6835]: Peerlink: role is now primary; elected
2025-03-30T08:54:09.424165+00:00 cumulus clagd[6835]: HealthCheck: role via backup is primary
2025-03-30T08:54:09.923190+00:00 cumulus clagd[6835]: Listening RTM_NEWNEIGH and RTM_DELNEIGH notifications
2025-03-30T08:54:09.980104+00:00 cumulus clagd[6835]: Resync Requested From Peer
2025-03-30T08:54:10.418567+00:00 cumulus clagd[6835]: HealthCheck: Delayed bring up of clag bonds for 1 seconds
2025-03-30T08:54:10.418804+00:00 cumulus clagd[6835]: Resync Requested From Peer
2025-03-30T08:54:10.636264+00:00 cumulus clagd[6835]: Initial neigh sync to peer done.
2025-03-30T08:54:10.637169+00:00 cumulus clagd[6835]: Initial data sync to peer done.
2025-03-30T08:54:11.444224+00:00 cumulus clagd[6835]: Initial data sync from peer done.
2025-03-30T08:54:11.444455+00:00 cumulus clagd[6835]: Initial handshake done.
2025-03-30T08:54:11.445466+00:00 cumulus clagd[6835]: UpdateProtoDownFlags Interface bond25 add flags 0x0 del flags 0x1 final flags 0x0
2025-03-30T08:54:12.643439+00:00 cumulus clagd[6835]: Initial STP param sync done to peer
2025-03-30T08:54:19.750053+00:00 cumulus clagd[6835]: bond25 moved from down to up
2025-03-30T08:54:19.758382+00:00 cumulus clagd[6835]: bond25 is now dual connected.
Switch Config1
- header:
model: MSN2010
nvue-api-version: nvue_v1
rev-id: 1.0
version: Cumulus Linux 5.8.0
- set:
bridge:
domain:
br_default:
vlan:
4,6,9,11-12,15,24,30,40,110-111: {}
interface:
bond25:
bond:
member:
swp16: {}
swp17: {}
mlag:
enable: on
id: 25
bridge:
domain:
br_default: {}
type: bond
eth0:
ip:
address:
192.168.100.221/24: {}
type: eth
peerlink:
bond:
member:
swp21: {}
swp22: {}
type: peerlink
peerlink.4094:
base-interface: peerlink
type: sub
vlan: 4094
swp1-9,16-17:
link:
speed: 25G
swp1-13:
bridge:
domain:
br_default:
stp:
admin-edge: on
swp1-13,16-17,21-22:
type: swp
swp16-17:
link:
fec: rs
state:
up: {}
swp21-22:
link:
auto-negotiate: off
speed: 100G
mlag:
backup:
192.168.100.222:
vrf: mgmt
enable: on
init-delay: 1
mac-address: 44:38:39:BE:EF:AA
peer-ip: linklocal
service:
dns:
mgmt:
server:
x.x.x.x: {}
ntp:
mgmt:
server:
x.x.x.x: {}
vrf:
mgmt:
router:
static:
0.0.0.0/0:
address-family: ipv4-unicast
via:
192.168.x.x:
type: ipv4-address
Switch Config 2
- header:
model: MSN2010
nvue-api-version: nvue_v1
rev-id: 1.0
version: Cumulus Linux 5.8.0
- set:
bridge:
domain:
br_default:
vlan:
4,6,9,11-12,15,24,30,40,110-111: {}
interface:
bond25:
bond:
member:
swp16: {}
swp17: {}
mlag:
enable: on
id: 25
bridge:
domain:
br_default: {}
type: bond
eth0:
ip:
address:
192.168.100.222/24: {}
type: eth
peerlink:
bond:
member:
swp21: {}
swp22: {}
type: peerlink
peerlink.4094:
base-interface: peerlink
type: sub
vlan: 4094
swp1-9,16-17:
link:
speed: 25G
swp1-13:
bridge:
domain:
br_default:
stp:
admin-edge: on
swp1-13,16-17:
link:
state:
up: {}
swp1-13,16-17,21-22:
type: swp
swp16-17:
link:
fec: rs
swp21-22:
link:
auto-negotiate: off
speed: 100G
mlag:
backup:
192.168.100.221:
vrf: mgmt
enable: on
init-delay: 1
mac-address: 44:38:39:BE:EF:AA
peer-ip: linklocal
priority: 32768
service:
dns:
mgmt:
server:
x.x.x.x: {}
ntp:
mgmt:
server:
x.x.x.x: {}
vrf:
mgmt:
router:
static:
0.0.0.0/0:
address-family: ipv4-unicast
via:
192.168.x.x:
type: ipv4-address