Hi,
I have a bunch of hosts talking rocev2 across LACP link aggregates terminating to two different switches (EVPN/ESI) and I’ve noticed that the switches keep blocking ports due to duplicate mac addresses seen. E.g. mac address table fluctuation. I can reliably reproduce this when using ib_send_bw with the -R (rdma_cm) flag, behavior doesn’t seem to surface if I leave the -R out. Also, the problem doesn’t show up if I disable the ports connecting to one of the two switches. Furthermore, it does not show up if I heavily load the setup with iperf3.
What I’ve done so far as an attempt to understand what is going on:
On the hosts involved, i’ve run tcpdump using mlx5_bond_0 as the device and filtered out the macs located on the cards forming up the bond. I can see egress frames which I believe shouldn’t be visible on the hosts, as if it indeed was sending out traffic with wrong src mac. From the host NIC perspective this tcpdump output certainly could also happen if the switch started to behave like a hub. However, this doesn’t seem to be the case as when i run ib_send_bw without -R or use other tools such as iperf3 to load the links. Hub-like -behavior also wouldn’t explain the mac duplication triggering as switches least appear to be learning macs.
I would appreciate any thoughts on what is maybe going on here, e.g. is there maybe a known reason why the src mac would differ from what is on the NIC when using -R with ib_send_b with roce lag and what troubleshooting steps i should consider taking next? Why the src mac behavior might differ depending whether -R is used?
Cards are MCX755106AS-HEA_Ax, running 28.39.3004 FW on a Alma Linux 9.6 with kernel 5.14.0-570.44.1.el9_6.x86_64, ib_send_bw version is 6.23
Any insight on the matter is greatly appreciated,
–
Vesa