Hi everyone,
I’ve been using the DPDK rte_flow asynchronous API to create a huge amount of rules at run-time, and I’ve been stunned at how slow the time it takes to rte_flow_async_create. For context, I did this:
uint64_t start_cycles = rte_get_tsc_cycles();
entry->src_rule_handle = rte_flow_async_create(dpdk_port_id, ctx->flow_queue_id, &async_op_params, Ptrs->offload.src_template_tables[dpdk_port_id], src_pattern, pattern_index, src_actions, 0, entry, &error);uint64_t
end_cycles = rte_get_tsc_cycles();
The difference averages around 1 million cycles.
My specs:
Device Type: ConnectX6DX
Part Number: MCX623106AN-CDA_Ax
Description: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
PSID: MT_0000000359
Versions: Current Available
FW 22.46.1006 N/A
PXE 3.8.0100 N/A
UEFI 14.39.0013 N/A
And I’m using DPDK version 24.11.2.
I have tried the following mitigation strategies:
- I tried toggling the
postponeflag with little effect. - I made sure to avoid contention by running the test with one thread only.
- I used four different NICs to verify my results.
- Always made sure to run
dv_flow_en=2. - Tried running on different groups, actions, and patterns. All made no effect on performance, except if I set the
src_patternto NULL, albeit this matches everything
Am I doing something wrong? I’m not seeing the huge throughput advertised.