ConnectX-6 Lx VF-LAG + ASAp²: Repeated SW fallback due to spontaneous TX PF recalculation despite healthy LACP

Hi,

We are running OpenStack with VF-LAG and OVS Kernel Hardware Offload (ASAp² / HWOL) on ConnectX-6 Lx NICs. We are experiencing repeated VM performance degradation and would like to request your assistance.

The root cause we have identified is spontaneous TX PF recalculation occurring inside the NIC firmware, despite LACP being fully healthy and lacp_rate aligned on both ends. Each time the active TX PF switches, ASAp² flows reset and re-establish, causing repeated SW fallback. This results in softirq spikes, VM application latency, and TCP retransmission surges.


1. Environment

Item Details
OS Ubuntu 24.04.3
Kernel 6.8.0-88-generic, 6.8.0-90-generic
OVS 2.17 (Kernel Offload, non-DPDK)
OVN 22.03.3 (Raft Cluster)
OpenStack Antelope (2023.1) / Neutron 22.2.2
Bond 802.3ad, hash, lacp_rate fast
Switchdev embedded-switch-mode: switchdev, 40 VFs
Server PowerEdge R7615
NIC Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
NIC Firmware 26.41.1000
NIC Driver MLNX_OFED_LINUX-24.04-0.6.6.0

2. Configuration

2-1. OVS

other_config : {hw-offload="true", max-idle="30000", vlan-limit="0"}
ovs_version  : "2.17.9"

2-2. VF-LAG and ASAp² (HWOL)

Both PFs are in switchdev mode, and VF-LAG is correctly configured with a single shared FDB:

root@host:~# devlink dev eswitch show pci/0000:82:00.0
pci/0000:82:00.0: mode switchdev inline-mode none encap-mode basic

root@host:~# devlink dev eswitch show pci/0000:82:00.1
pci/0000:82:00.1: mode switchdev inline-mode none encap-mode basic
mlx5_core 0000:82:00.0: shared_fdb:1 mode:hash
mlx5_core 0000:82:00.0: Operation mode is single FDB

3. Problem Statement

The root cause is spontaneous TX PF recalculation within the NIC firmware. The consequence is repeated ASAp² flow reset and SW fallback.

To be specific:

  • LACP is healthy. No link flap, no LACP timeout, no renegotiation.

  • lacp_rate is aligned to the same value on both ends.

  • Despite this, the active TX PF switches spontaneously between enp130s0f0np0 and enp130s0f1np1 at irregular intervals.

  • Each switchover causes ASAp² (HWOL) flows to reset. During the re-establishment window, traffic falls back to SW path.

  • This SW fallback repeats every time the TX PF is recalculated, resulting in recurring softirq spikes, VM latency, and TCP retransmissions.

  • RX is unaffected throughout.

The irregularity of the switchover intervals and the absence of any driver-level event strongly suggest the TX PF is being re-determined inside the firmware, independent of the host network stack.


4. ASAp² (HWOL) Is Confirmed Operational

The issue is not a general ASAp² failure. Offloading is functioning correctly outside of the transition window. The problem is that each TX PF recalculation triggers a flow reset cycle.

TC offload status during TX switchover (host24):

VF not_in_hw in_hw
enp130s0f1vf18 4~7 2~4
enp130s0f1vf36 3~8 8~10

The not_in_hw entries are due to inherently non-offloadable flow characteristics such as CT/NAT, and are unrelated to the TX switchover.

OVS datapath statistics captured during a switchover event:

lookups: hit:92473181 missed:1535066 lost:0

The miss count reflects SW fallback during the HWOL flow re-establishment period, not a persistent offload failure.


5. No Driver-Level Event on TX Switchover

During TX PF switchover, no lag map log appears in dmesg. For reference, a legitimate LACP renegotiation produces clear logs:

mlx5_core 0000:82:00.0: lag map active ports: 1
mlx5_core 0000:82:00.0: lag map active ports: 1, 2
mlx5_core 0000:82:00.0: lag map active ports: 2
mlx5_core 0000:82:00.0: lag map active ports: 1, 2

No bond driver-level events are observed either. The TX PF switchover occurs entirely without mlx5 driver or bond driver notification, which confirms our assessment that the TX port is being re-determined internally within the firmware.


6. Observed Patterns (Attached Graphs)

Three graphs are attached to illustrate how the issue manifests.

  • Graph 1: Under active traffic, softirq accumulates progressively with each TX PF switchover and does not recover between transitions. This represents the worst-case scenario where SW fallback compounds over time.

  • Graph 2: Host with VMs deployed (ASAp² ports registered) but minimal traffic. TX switchover still occurs, but softirq impact is negligible (~0.3%). This confirms that the flow reset cycle itself is consistent, but the impact scales with traffic volume at the time of switchover.

  • Graph 3: Moderate traffic scenario showing softirq spikes on each TX PF switchover, with partial recovery between events. The irregular intervals between switchovers are clearly visible, ruling out any periodic or externally scheduled trigger.


7. Potential Regression — Legacy vs. Current Environment

The same configuration on our legacy environment did not exhibit this issue. The problem began after the following upgrades, and we are not able to confirm whether this is a firmware, driver, or kernel interaction issue.

Item Legacy Environment (Healthy) Current Environment (Issue Present)
OS Ubuntu 22.04 Ubuntu 24.04.3
Kernel 5.15.0-128 6.8.0-88-generic, 6.8.0-90-generic
NIC Driver MLNX_OFED 5.8-3.0.7 MLNX_OFED_LINUX-24.04-0.6.6.0
NIC Firmware 26.36.1010 (DEL0000000031) 26.41.1000 (DEL0000000031)

8. Questions

  1. Is spontaneous TX PF recalculation within the NIC firmware a known behavior in VF-LAG mode, even when LACP is healthy?

  2. Is there a known issue where internal TX PF recalculation triggers ASAp² flow reset in MLNX_OFED 24.04 or FW 26.41.1000?

  3. Is there a way to suppress or pin the active TX PF to prevent recalculation, as a workaround?


Thank you for your time and assistance.

Hi @kyoon,

Thank you for posting your query on our community.
Please note that based on the complexity of issue described, a valid support entitlement will be required to debug this issue.

That said, since these are Dell PSID adapters, support for firmware/driver behavior on these systems is typically handled through Dell. We would recommend that it would be best to engage Dell support on this issue.

Thanks,
Bhargavi