BlueField-2 DPU's RDMA performance is weaker than ConnectX4

Our network device is a InfiniBand BlueField-2 DPU, equipped with 2 100Gbps network ports and 1 ConnectX6 NIC. And our DPUs are all in Seperated Host Mode.

Recently I found that our DPUs are not fully performing as well as CX6 NICs for RDMA networks, and are even weaker than CX4 NICs. To show this, I use perftest to gather statistic. Notice our host lock CPU frequency to a fixed value. There is no other difference except network device.

My command is ib_read_bw -s 64 --duration 5 --run_infinitely -q 6 -l 512 -t 512. I choose a CX4 pair, a CX6 pair, and a DPU pair to show the gap:

Pair | Throughput
cx4 | 2180M/sec
cx6 | 2450M/sec
dpu | 1904M/sec

The results are shocking to me, I am wondering why.
My personal guess is that CX6 on DPU forcibly pre-allocate some resources to the embedded switch or the arm subsystem, making host cannot use full power of CX6. But I didn’t find evidence to confirm.

Expecting someone to discuss with me, thanks.

1 Like