The raw throughput of BlueField-1 cannot reach the line rate

user52115 · April 7, 2022, 1:24pm

Hi, I am testing the raw throughput of BlueField DPU, the version is MBF1L516A-CSNAT. I find that the throughput of 64B-packet flow cannot reach the line rate 100 Gbps (or 148 Mpps). I don’t know whether my measurement strategy is correct.

Target - [SmartNIC/EMBEDDED_CPU mode]

The throughput when send packets from local BlueField OS to local host
The throughput when send packets from local host to local BlueField OS

Environment and Tools

The host is Ubuntu 20.04 with Linux 5.4.0 kernel, the BlueField OS is Ubuntu 20.04 with Linux 5.4.0-1008-bluefield kernel
DPDK-21.11, pktgen-21.11.0
‘Ethtool $NIC rx off tx off’ to disable pause frame

Results

We use 14 cores for pktgen TX on local host, and 14 cores for pktgen RX on BlueField. When sending 64B packets from local host to local BlueField, the maximum send throughput is ~44 Mpps, and the corresponding receive throughput is ~31 Mpps. When the packet size is changed to 1500B, the throughput can reach ~110 Gbps, which is about the line rate of PCIe bandwidth
We use 14 cores for pktgen RX on local host, and 14 cores for pktgen TX on BlueField. When sending 64B packets from local BlueField to local host, the maximum send throughput is ~32 Mpps, while the recieve throughput is ~19 Mpps. When the packet size is changed to 1500B, the throughput can also reach the line rate (~110 Gbps)

Questions

Does this card reach line rate using DPDK? Or does it mean that BlueField’s ARM cores cannot support generating packets or receiving packets at line rate?
For sending from local host to local BlueField, why the host cannot generate packets at line rate? Are there some back pressures from BlueField?
Are they better ways to measure the throughput of BlueField card?

Thanks. :)

ssimcoejr · April 13, 2022, 1:59pm

Hi user52115,

Thank you for posting your inquiry to the NVIDIA developer forums.

In your results, you state that you can achieve line rate using larger payload size (1500b), as opposed to smaller packet sizes (64b). This is expected - using smaller packets will achieve shorter latency, whereas using larger packets will enable higher throughput (at the cost of latency).

You may be able to realize better results via system tuning:
https://support.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

If you require further assistance with system tuning or benchmarking, please open a support case at: https://support.mellanox.com/s/

Best,
NVIDIA Networking Support

system · April 27, 2022, 2:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to configure the BF3 DPU to support line-rate RDMA over high-latency links InfiniBand/VPI Adapter Cards	2	97	October 22, 2025
Maximum Pkt rate around 2 Million BlueField	9	188	August 24, 2024
Throughput interference between host and BF2 BlueField	1	74	July 17, 2024
Bluefield2 HBN only 25Mbit with nl2doca running, 1G without BlueField	2	68	September 25, 2025
Bluefield 2 DPU handling only 500MBps traffic BlueField kb	2	797	May 11, 2023
Rte_eth_rx_burst function only can receive 1024 packets on Bluefield2 BlueField	2	116	December 25, 2024
Measure the performance of the accelerators on BlueField-1 card BlueField	3	1082	May 8, 2022
Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference; Ethernet Adapter Cards	10	1069	October 4, 2018
Packet Capture BlueField	1	548	June 6, 2024
BlueField-2 DPU's RDMA performance is weaker than ConnectX4 BlueField networking , rdma-and-roce	0	1012	May 30, 2022

The raw throughput of BlueField-1 cannot reach the line rate

Target - [SmartNIC/EMBEDDED_CPU mode]

Environment and Tools

Results

Questions

Related topics