DOCA Flow 3.3.0: LPM pipe action data corruption after entry remove+add with different action_idx

Environment

  • DOCA SDK: 3.3.0 (doca-host_3.3.0-088000-26.01)
  • NIC: ConnectX-6 Dx (MT4125)
  • Firmware: 22.43.1014
  • OS: Ubuntu 24.04, kernel 6.18.1
  • Mode: switch,isolated,hws (hardware steering, eSwitch switchdev)
  • Device probe args: dv_flow_en=2,fdb_def_rule_en=0,dv_xmeta_en=4

Summary

When an LPM pipe has multiple action templates (e.g., 3 templates with indices 0, 1, 2), adding entries with action_idx=0, removing them, and then adding new entries with action_idx=2 to the same pipe results in corrupted action data on the new entries. The entry matches packets correctly (counters increment), but changeable action fields (e.g., ip4.dst_ip) contain stale values (0) instead of the values passed at entry creation.

The bug does not reproduce with libdoca_flow_trace (trace/debug library) — only with the release libdoca_flow.

Reproduction

Pipe setup

LPM pipe with 3 action templates:

// Template 0: MAC rewrite only (connected routes)
SET_MAC_ADDR(actions_conn.outer.eth.dst_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
SET_MAC_ADDR(actions_conn.outer.eth.src_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);

// Template 1: encap (not relevant to this bug)
// ...

// Template 2: MAC rewrite + dst_ip rewrite (DNAT)
SET_MAC_ADDR(actions_dnat.outer.eth.dst_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
SET_MAC_ADDR(actions_dnat.outer.eth.src_mac, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff);
actions_dnat.outer.l3_type = DOCA_FLOW_L3_TYPE_IP4;
actions_dnat.outer.ip4.dst_ip = 0xffffffff; // changeable

actions_arr[0] = &actions_conn;
actions_arr[1] = &actions_encap;
actions_arr[2] = &actions_dnat;

Steps to reproduce

  1. Create LPM pipe with 3 action templates as above
  2. Add entry A: 10.123.123.7/32 with action_idx=0 (connected, no dst_ip field)
  3. Process entry: doca_flow_entries_process(port, 0, timeout, 1) → callback SUCCESS
  4. Remove entry A: doca_flow_pipe_remove_entry(0, NO_WAIT, entry_A)
  5. Process removal: doca_flow_entries_process(port, 0, timeout, 0)
  6. Add entry B: 203.0.113.1/32 with action_idx=2, actions.outer.ip4.dst_ip = inet_addr("10.123.123.7")
  7. Process entry: doca_flow_entries_process(port, 0, timeout, 1) → callback SUCCESS
  8. doca_flow_pipe_entry_get_status(entry_B)DOCA_FLOW_ENTRY_STATUS_SUCCESS

Expected behavior

Packets matching 203.0.113.1/32 should have dst_ip rewritten to 10.123.123.7.

Actual behavior

  • Entry B matches packets (per-entry non-shared counter increments)
  • Entry B status is DOCA_FLOW_ENTRY_STATUS_SUCCESS
  • But dst_ip is rewritten to 0.0.0.0 instead of 10.123.123.7
  • tcpdump inside the destination VM confirms dst=0.0.0.0

Key observations

  1. Fresh pipe (no prior entries removed): action_idx=2 with dst_ip rewrite works correctly
  2. After remove+add cycle (steps 2-7 above): action_idx=2 dst_ip field corrupted to 0
  3. Workaround: calling doca_flow_pipe_control_add_entry() on the same port with a match that covers the same packet flow (e.g., matching the same outer IPv6 SID) “refreshes” the HW steering state and fixes the corrupted action. The control entry can be immediately removed after — the add operation itself is sufficient.
  4. doca_flow_entries_process() with any max_processed_entries value does NOT fix the issue — only doca_flow_pipe_control_add_entry() with a matching flow region does.
  5. Trace library (libdoca_flow_trace) does NOT reproduce this bug — only the release library (libdoca_flow). This suggests the trace library has additional internal state validation/initialization that the release library omits.

Impact

This bug prevents reliable use of LPM pipes with multiple action templates in long-running applications where entries are dynamically added and removed (e.g., routing daemons, NAT gateways). Action fields that are present in one template but absent in another (like ip4.dst_ip in template 2 but not in template 0) get corrupted when entries cycle through remove+add with template switching.

Workaround

After adding LPM entries following a remove+add cycle with different action_idx, trigger a control pipe add operation with a match pattern that covers the same packet flow. This forces HW steering recalculation:

// After LPM entry add with different action_idx:
doca_flow_pipe_control_add_entry(0, control_pipe,
    &matching_flow_match, &mask, NULL, &actions, NULL, NULL,
    NULL, priority, &fwd, &status, &flush_entry);
doca_flow_entries_process(port, 0, timeout, 1);
// Entry can be immediately removed after
doca_flow_pipe_remove_entry(0, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, flush_entry);

Minimal reproducer

Available upon request. The issue was discovered in a hardware-offloaded SRv6 L3 router using DOCA Flow 3.3.0 on ConnectX-6 Dx, where overlay LPM pipe entries transition between connected routes (action_idx=0, MAC rewrite only) and DNAT routes (action_idx=2, MAC + dst_ip rewrite) during VRF lifecycle events.

We downgraded to 3.2.1 and will upgrade back once this is fixed.