Persistent Link Flap Issue on Ethernet Switch

Hello,

Our current configuration consists of an NVIDIA SuperPOD setup.

We are experiencing frequent link flaps on the Ethernet switch used in this configuration. Has anyone else encountered the same issue?

Upon checking the affected link, we observe persistent row alarms, while temperature and other parameters remain normal.

Currently, we are resolving the issue by replacing transceivers or cables.
However, it occurs so frequently that we suspect a hardware defect. Has anyone else experienced the same symptoms?

We are using Cumulus Linux Version 5.11.1

If you’ve already swapped transceivers and cables and it still flaps, that pretty much rules out the obvious PHY parts.

Next things you can check:

Move the link to a different switch port / different line card and see if the problem follows the port.

Try a direct, simple setup (no LACP/MLAG, no special configs) just to exclude protocol-side churn.

Check switch logs for who drops first (local PHY vs remote).

If it consistently happens on the same port/ASIC area after all swaps, this really starts to look like a switch-side hardware or PHY issue.