Hey guys !
I’ve an issue I can’t understand for now.
We had a pair of spectrum MSN3700 switches connected to each other with mlag.
After an issue with on of the 2 which needed a complete reinstallation the first one isn’t able to start the clagd service anymore.
I got the following logs from journalctl :
Aug 11 10:11:21 cumulus systemd[1]: Starting Cumulus Linux Multi-Chassis LACP Bonding Daemon…
Aug 11 10:11:21 cumulus clagd[3492775]: Clag Initializing
Aug 11 10:11:21 cumulus clagd[3492775]: Cleanup is executing.
Aug 11 10:11:21 cumulus clagd[3492850]: RTNETLINK answers: No such device
Aug 11 10:11:21 cumulus clagd[3492775]: Cleanup is finished
Aug 11 10:11:22 cumulus clagd[3492774]: Beginning execution of clagd version 1.4.0
Aug 11 10:11:22 cumulus clagd[3492774]: Invoked with: /usr/sbin/clagd --daemon linklocal peerlink.4094 44:38:39:BE:EF:BA --priority 32768 --backupIp 172.16.20.252 --initDelay 180 --debug 0xfffffffff
Aug 11 10:11:22 cumulus clagd[3492774]: macAddr = 44:38:39:be:ef:ba
Aug 11 10:11:23 cumulus clagd[3492774]: Allowing duplicate LACP partner MACs
Aug 11 10:11:23 cumulus clagd[3492774]: Role is now secondary
Aug 11 10:11:24 cumulus clagd[3492774]: Thread to receive from CSU Manager – Started
Aug 11 10:11:31 cumulus clagd[3492774]: [Thread-1] -----Thread 140398500529920 “CS Manager PUB/SUB” hangs -----
File “/usr/lib/python3.7/threading.py”, line 885, in _bootstrap
self._bootstrap_inner()
File “/usr/lib/python3.7/threading.py”, line 917, in _bootstrap_inner
self.run()
File “/usr/lib/python3.7/threading.py”, line 865, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python3/dist-packages/clag/clagdcsu.py”, line 266, in handlePubSubMesgs
self.csuLock.acquire()
Aug 11 10:11:31 cumulus clagd[3492774]: [Thread-1] ---------Thread 140398577858368 “MainThread” hangs ---------
File “/usr/sbin/clagd”, line 7119, in
main()
File “/usr/sbin/clagd”, line 7067, in main
ClagRun(nlm)
File “/usr/sbin/clagd”, line 6993, in ClagRun
csu_client = clagdcsu.CSUClient(Log, Intf, Parser)
File “/usr/lib/python3/dist-packages/clag/clagdcsu.py”, line 55, in init
self.sendLoadCompleteMsg()
File “/usr/lib/python3/dist-packages/clag/clagdcsu.py”, line 96, in sendLoadCompleteMsg
resp = self.comm_socket.recv()
Has anyone seen this ? I’m on cumulus 5.6. How can I pinpoint where the issue comes from ?
I was able during my investigation to run the clagd service from cli without issues but every time I try with systemd it doesn’t work.