Environment
- DOCA SDK: 3.3.0 (doca-host_3.3.0-088000-26.01)
- NIC: ConnectX-6 Dx (MT4125)
- Firmware: 22.43.1014
- OS: Ubuntu 24.04, kernel 6.18.1
- Mode:
switch,isolated,hws(hardware steering, eSwitch switchdev) - Device probe args:
dv_flow_en=2,fdb_def_rule_en=0,dv_xmeta_en=4
Summary
When an LPM pipe has multiple action templates (e.g., 3 templates with indices 0, 1, 2), adding entries with action_idx=0, removing them, and then adding new entries with action_idx=2 to the same pipe results in corrupted action data on the new entries. The entry matches packets correctly (counters increment), but changeable action fields (e.g., ip4.dst_ip) contain stale values (0) instead of the values passed at entry creation.
The bug does not reproduce with libdoca_flow_trace (trace/debug library) — only with the release libdoca_flow.
Reproduction
Pipe setup
LPM pipe with 3 action templates:
// Template 0: MAC rewrite only (connected routes)
SET_MAC_ADDR(actions_conn.outer.eth.dst_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
SET_MAC_ADDR(actions_conn.outer.eth.src_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
// Template 1: encap (not relevant to this bug)
// ...
// Template 2: MAC rewrite + dst_ip rewrite (DNAT)
SET_MAC_ADDR(actions_dnat.outer.eth.dst_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
SET_MAC_ADDR(actions_dnat.outer.eth.src_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
actions_dnat.outer.l3_type = DOCA_FLOW_L3_TYPE_IP4;
actions_dnat.outer.ip4.dst_ip = 0xffffffff; // changeable
actions_arr[0] = &actions_conn;
actions_arr[1] = &actions_encap;
actions_arr[2] = &actions_dnat;
Steps to reproduce
- Create LPM pipe with 3 action templates as above
- Add entry A:
10.123.123.7/32withaction_idx=0(connected, no dst_ip field) - Process entry:
doca_flow_entries_process(port, 0, timeout, 1)→ callback SUCCESS - Remove entry A:
doca_flow_pipe_remove_entry(0, NO_WAIT, entry_A) - Process removal:
doca_flow_entries_process(port, 0, timeout, 0) - Add entry B:
203.0.113.1/32withaction_idx=2,actions.outer.ip4.dst_ip = inet_addr("10.123.123.7") - Process entry:
doca_flow_entries_process(port, 0, timeout, 1)→ callback SUCCESS doca_flow_pipe_entry_get_status(entry_B)→DOCA_FLOW_ENTRY_STATUS_SUCCESS
Expected behavior
Packets matching 203.0.113.1/32 should have dst_ip rewritten to 10.123.123.7.
Actual behavior
- Entry B matches packets (per-entry non-shared counter increments)
- Entry B status is
DOCA_FLOW_ENTRY_STATUS_SUCCESS - But
dst_ipis rewritten to0.0.0.0instead of10.123.123.7 - tcpdump inside the destination VM confirms
dst=0.0.0.0
Key observations
- Fresh pipe (no prior entries removed): action_idx=2 with dst_ip rewrite works correctly
- After remove+add cycle (steps 2-7 above): action_idx=2 dst_ip field corrupted to 0
- Workaround: calling
doca_flow_pipe_control_add_entry()on the same port with a match that covers the same packet flow (e.g., matching the same outer IPv6 SID) “refreshes” the HW steering state and fixes the corrupted action. The control entry can be immediately removed after — the add operation itself is sufficient. doca_flow_entries_process()with anymax_processed_entriesvalue does NOT fix the issue — onlydoca_flow_pipe_control_add_entry()with a matching flow region does.- Trace library (
libdoca_flow_trace) does NOT reproduce this bug — only the release library (libdoca_flow). This suggests the trace library has additional internal state validation/initialization that the release library omits.
Impact
This bug prevents reliable use of LPM pipes with multiple action templates in long-running applications where entries are dynamically added and removed (e.g., routing daemons, NAT gateways). Action fields that are present in one template but absent in another (like ip4.dst_ip in template 2 but not in template 0) get corrupted when entries cycle through remove+add with template switching.
Workaround
After adding LPM entries following a remove+add cycle with different action_idx, trigger a control pipe add operation with a match pattern that covers the same packet flow. This forces HW steering recalculation:
// After LPM entry add with different action_idx:
doca_flow_pipe_control_add_entry(0, control_pipe,
&matching_flow_match, &mask, NULL, &actions, NULL, NULL,
NULL, priority, &fwd, &status, &flush_entry);
doca_flow_entries_process(port, 0, timeout, 1);
// Entry can be immediately removed after
doca_flow_pipe_remove_entry(0, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, flush_entry);
Minimal reproducer
Available upon request. The issue was discovered in a hardware-offloaded SRv6 L3 router using DOCA Flow 3.3.0 on ConnectX-6 Dx, where overlay LPM pipe entries transition between connected routes (action_idx=0, MAC rewrite only) and DNAT routes (action_idx=2, MAC + dst_ip rewrite) during VRF lifecycle events.