Hi, I am testing the performance of the regex accelerator on BlueField-2 using RXPBench. I find that the processing throughput shows a drop when increasing the # of cores (processes).
Setting
The regex buffer len is set to 64B, batch size is 64, on BlueField ARM OS, fix bytes/core, vary # of cores for RXPBench
Results
# of cores |
total regex bytes / B |
regex perf / Mpps |
regex perf / Gbps |
max latency / us |
min latency / us |
avg latency / us |
avg regex buffer len / B |
1 |
7884421000 |
4.748 |
2.431 |
3623.535 |
6.465 |
102.07 |
64 |
2 |
15768842000 |
4.7585 |
2.4363 |
3772.23 |
5.985 |
202.175 |
64 |
3 |
23653263000 |
4.7005 |
2.4067 |
3872.445 |
4.35 |
250.84 |
64 |
4 |
31537684000 |
4.4552 |
2.2811 |
3997.83 |
4.83 |
319.51 |
64 |
5 |
39422105000 |
4.1812 |
2.1408 |
24079.245 |
4.605 |
398.985 |
64 |
6 |
47306526000 |
3.9934 |
2.0446 |
10226.4 |
5.535 |
502.495 |
64 |
7 |
55190947000 |
3.851 |
1.9717 |
11718.555 |
3.6 |
564.245 |
64 |
8 |
63075368000 |
3.6674 |
1.8777 |
45690.3 |
5.415 |
697.18 |
64 |
As shown in the results, we can see that the regex accelerator performance drops by ~20%. Intuitively, it might not be due to the queue contention. I have no idea about the reason for the drop. Could you please do me a favor? Is there any hardware limitation on the accelerator? Or are there some other insights for this result? Thanks. :)
Hello,
What is this results posted came from? In better words what exactly are you running?
(Reference link: RXPBench :: NVIDIA DOCA SDK Documentation).
As basis, this is based on latest DOCA/FW version?
What guidelines have you followed using RXPBench?
Should deeper investigation apply, a support case will need to be opened with Nvidia.
Sophie.
Hi Sophie,
Thanks a lot for your response. :)
FYI, the RXPBench guideline I used is from here.
And I use the DOCA_v1.2.1_BlueField_OS_Ubuntu_20.04-5.4.0-1023-bluefield-5.5-2.1.7.0-3.8.5.12027-1.signed-aarch64.bfb which is provide by the DOCA_v1.2.1 to install the BlueField OS, therefore, the default DOCA version should be v1.2.1.
For the FW, I installed MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.tgz,
Thanks again for your quick apply.
Best regards.
Hello,
So you are on the latest and greatest version.
I am checking internally if there is a penalty/limitation using RXPBench and specific number of cores but to my knowledge, that would be odd.
Sophie.
Hello,
I inquired about possible penalties/limitations on the number of cores used by RXPBench and the answer provided;
In general yes. Multicore application can’t scale linearly with cores.
I am not sure if you are running the benchmark from the host or the DPU however, my suggestion would be to open a support case in order to dissect and further investigate multiple factors are in play here.
Sophie.
Hi Sophie,
Thanks a lot for your response. :)
I run the RXPBench from the DPU. I am curious that if such a performance drop is due to some hardware design problems that are a blackbox for me. Besides, are there ways to mitigate this from proper software design. Hope to discuss this in details.
BTW, could you please tell me how to open a support case?
Best regards.