Performance drop of the regex accelerator on BlueField-2

Hi, I am testing the performance of the regex accelerator on BlueField-2 using RXPBench. I find that the processing throughput shows a drop when increasing the # of cores (processes).

Setting

The regex buffer len is set to 64B, batch size is 64, on BlueField ARM OS, fix bytes/core, vary # of cores for RXPBench

Results

# of cores total regex bytes / B regex perf / Mpps regex perf / Gbps max latency / us min latency / us avg latency / us avg regex buffer len / B
1 7884421000 4.748 2.431 3623.535 6.465 102.07 64
2 15768842000 4.7585 2.4363 3772.23 5.985 202.175 64
3 23653263000 4.7005 2.4067 3872.445 4.35 250.84 64
4 31537684000 4.4552 2.2811 3997.83 4.83 319.51 64
5 39422105000 4.1812 2.1408 24079.245 4.605 398.985 64
6 47306526000 3.9934 2.0446 10226.4 5.535 502.495 64
7 55190947000 3.851 1.9717 11718.555 3.6 564.245 64
8 63075368000 3.6674 1.8777 45690.3 5.415 697.18 64

As shown in the results, we can see that the regex accelerator performance drops by ~20%. Intuitively, it might not be due to the queue contention. I have no idea about the reason for the drop. Could you please do me a favor? Is there any hardware limitation on the accelerator? Or are there some other insights for this result? Thanks. :)

Hello,

What is this results posted came from? In better words what exactly are you running?
(Reference link: RXPBench :: NVIDIA DOCA SDK Documentation).

As basis, this is based on latest DOCA/FW version?

What guidelines have you followed using RXPBench?

Should deeper investigation apply, a support case will need to be opened with Nvidia.

Sophie.

Hi Sophie,

Thanks a lot for your response. :)

FYI, the RXPBench guideline I used is from here.

And I use the DOCA_v1.2.1_BlueField_OS_Ubuntu_20.04-5.4.0-1023-bluefield-5.5-2.1.7.0-3.8.5.12027-1.signed-aarch64.bfb which is provide by the DOCA_v1.2.1 to install the BlueField OS, therefore, the default DOCA version should be v1.2.1.

For the FW, I installed MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.tgz,

Thanks again for your quick apply.

Best regards.

Hello,

So you are on the latest and greatest version.
I am checking internally if there is a penalty/limitation using RXPBench and specific number of cores but to my knowledge, that would be odd.

Sophie.

Hello,

I inquired about possible penalties/limitations on the number of cores used by RXPBench and the answer provided;
In general yes. Multicore application can’t scale linearly with cores.
I am not sure if you are running the benchmark from the host or the DPU however, my suggestion would be to open a support case in order to dissect and further investigate multiple factors are in play here.

Sophie.

Hi Sophie,

Thanks a lot for your response. :)

I run the RXPBench from the DPU. I am curious that if such a performance drop is due to some hardware design problems that are a blackbox for me. Besides, are there ways to mitigate this from proper software design. Hope to discuss this in details.

BTW, could you please tell me how to open a support case?

Best regards.