Hi,
We are running OpenStack with VF-LAG and OVS Kernel Hardware Offload (ASAp² / HWOL) on ConnectX-6 Lx NICs. We are experiencing repeated VM performance degradation and would like to request your assistance.
The root cause we have identified is spontaneous TX PF recalculation occurring inside the NIC firmware, despite LACP being fully healthy and lacp_rate aligned on both ends. Each time the active TX PF switches, ASAp² flows reset and re-establish, causing repeated SW fallback. This results in softirq spikes, VM application latency, and TCP retransmission surges.
1. Environment
| Item | Details |
|---|---|
| OS | Ubuntu 24.04.3 |
| Kernel | 6.8.0-88-generic, 6.8.0-90-generic |
| OVS | 2.17 (Kernel Offload, non-DPDK) |
| OVN | 22.03.3 (Raft Cluster) |
| OpenStack | Antelope (2023.1) / Neutron 22.2.2 |
| Bond | 802.3ad, hash, lacp_rate fast |
| Switchdev | embedded-switch-mode: switchdev, 40 VFs |
| Server | PowerEdge R7615 |
| NIC | Mellanox Technologies MT2894 Family [ConnectX-6 Lx] |
| NIC Firmware | 26.41.1000 |
| NIC Driver | MLNX_OFED_LINUX-24.04-0.6.6.0 |
2. Configuration
2-1. OVS
other_config : {hw-offload="true", max-idle="30000", vlan-limit="0"}
ovs_version : "2.17.9"
2-2. VF-LAG and ASAp² (HWOL)
Both PFs are in switchdev mode, and VF-LAG is correctly configured with a single shared FDB:
root@host:~# devlink dev eswitch show pci/0000:82:00.0
pci/0000:82:00.0: mode switchdev inline-mode none encap-mode basic
root@host:~# devlink dev eswitch show pci/0000:82:00.1
pci/0000:82:00.1: mode switchdev inline-mode none encap-mode basic
mlx5_core 0000:82:00.0: shared_fdb:1 mode:hash
mlx5_core 0000:82:00.0: Operation mode is single FDB
3. Problem Statement
The root cause is spontaneous TX PF recalculation within the NIC firmware. The consequence is repeated ASAp² flow reset and SW fallback.
To be specific:
-
LACP is healthy. No link flap, no LACP timeout, no renegotiation.
-
lacp_rate is aligned to the same value on both ends.
-
Despite this, the active TX PF switches spontaneously between enp130s0f0np0 and enp130s0f1np1 at irregular intervals.
-
Each switchover causes ASAp² (HWOL) flows to reset. During the re-establishment window, traffic falls back to SW path.
-
This SW fallback repeats every time the TX PF is recalculated, resulting in recurring softirq spikes, VM latency, and TCP retransmissions.
-
RX is unaffected throughout.
The irregularity of the switchover intervals and the absence of any driver-level event strongly suggest the TX PF is being re-determined inside the firmware, independent of the host network stack.
4. ASAp² (HWOL) Is Confirmed Operational
The issue is not a general ASAp² failure. Offloading is functioning correctly outside of the transition window. The problem is that each TX PF recalculation triggers a flow reset cycle.
TC offload status during TX switchover (host24):
| VF | not_in_hw | in_hw |
|---|---|---|
| enp130s0f1vf18 | 4~7 | 2~4 |
| enp130s0f1vf36 | 3~8 | 8~10 |
The not_in_hw entries are due to inherently non-offloadable flow characteristics such as CT/NAT, and are unrelated to the TX switchover.
OVS datapath statistics captured during a switchover event:
lookups: hit:92473181 missed:1535066 lost:0
The miss count reflects SW fallback during the HWOL flow re-establishment period, not a persistent offload failure.
5. No Driver-Level Event on TX Switchover
During TX PF switchover, no lag map log appears in dmesg. For reference, a legitimate LACP renegotiation produces clear logs:
mlx5_core 0000:82:00.0: lag map active ports: 1
mlx5_core 0000:82:00.0: lag map active ports: 1, 2
mlx5_core 0000:82:00.0: lag map active ports: 2
mlx5_core 0000:82:00.0: lag map active ports: 1, 2
No bond driver-level events are observed either. The TX PF switchover occurs entirely without mlx5 driver or bond driver notification, which confirms our assessment that the TX port is being re-determined internally within the firmware.
6. Observed Patterns (Attached Graphs)
Three graphs are attached to illustrate how the issue manifests.
-
Graph 1: Under active traffic, softirq accumulates progressively with each TX PF switchover and does not recover between transitions. This represents the worst-case scenario where SW fallback compounds over time.
-
Graph 2: Host with VMs deployed (ASAp² ports registered) but minimal traffic. TX switchover still occurs, but softirq impact is negligible (~0.3%). This confirms that the flow reset cycle itself is consistent, but the impact scales with traffic volume at the time of switchover.
-
Graph 3: Moderate traffic scenario showing softirq spikes on each TX PF switchover, with partial recovery between events. The irregular intervals between switchovers are clearly visible, ruling out any periodic or externally scheduled trigger.
7. Potential Regression — Legacy vs. Current Environment
The same configuration on our legacy environment did not exhibit this issue. The problem began after the following upgrades, and we are not able to confirm whether this is a firmware, driver, or kernel interaction issue.
| Item | Legacy Environment (Healthy) | Current Environment (Issue Present) |
|---|---|---|
| OS | Ubuntu 22.04 | Ubuntu 24.04.3 |
| Kernel | 5.15.0-128 | 6.8.0-88-generic, 6.8.0-90-generic |
| NIC Driver | MLNX_OFED 5.8-3.0.7 | MLNX_OFED_LINUX-24.04-0.6.6.0 |
| NIC Firmware | 26.36.1010 (DEL0000000031) | 26.41.1000 (DEL0000000031) |
8. Questions
-
Is spontaneous TX PF recalculation within the NIC firmware a known behavior in VF-LAG mode, even when LACP is healthy?
-
Is there a known issue where internal TX PF recalculation triggers ASAp² flow reset in MLNX_OFED 24.04 or FW 26.41.1000?
-
Is there a way to suppress or pin the active TX PF to prevent recalculation, as a workaround?
Thank you for your time and assistance.


