Own libibverbs application similar to ib_send_bw with low throughput

michael.schoettner · April 10, 2017, 7:23am

Dear Mellanox Community,

I created a small application that mimics the concept of ib_send_bw to get familiar with libibverbs and programming for InfiniBand hardware. With the source code of ib_send_bw and resources like rdmamojo.com, it was comfortable to get something up and running. However, my own test application isn’t performing well on IBV_WR_SEND work requests compared to ib_send_bw.

The code is available here:

GitHub - stnot/ib_test https://github.com/stnot/ib_test

I run this on a cluster with two nodes connected to a 18 port Mellanox 56 gbit switch and 56gbit HCAs installed on the nodes. Please let me know if you need more information about my setup to analyze this issue.

Running ib_send_bw -a prints the following output:

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

local address: LID 0x04 QPN 0x02bd PSN 0x8f0e46

remote address: LID 0x08 QPN 0x0341 PSN 0xd9ee2e

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

2 1000 0.00 11.00 5.765330

4 1000 0.00 35.07 9.192796

8 1000 0.00 72.67 9.524639

16 1000 0.00 145.20 9.516145

32 1000 0.00 276.78 9.069401

64 1000 0.00 584.29 9.573034

128 1000 0.00 1173.28 9.611550

256 1000 0.00 2108.89 8.637993

512 1000 0.00 3693.42 7.564126

1024 1000 0.00 4143.79 4.243243

2048 1000 0.00 4385.30 2.245272

4096 1000 0.00 4457.75 1.141185

8192 1000 0.00 4486.35 0.574253

16384 1000 0.00 4509.63 0.288616

32768 1000 0.00 4514.77 0.144473

65536 1000 0.00 4517.63 0.072282

131072 1000 0.00 4518.87 0.036151

262144 1000 0.00 4519.43 0.018078

524288 1000 0.00 4519.53 0.009039

1048576 1000 0.00 4519.82 0.004520

2097152 1000 0.00 4519.94 0.002260

4194304 1000 0.00 4519.97 0.001130

8388608 1000 0.00 4519.97 0.000565

I changed some settings to get values that are (I guess) comparable to the current settings on my own implementation:

ib_send_bw --rx-depth=100 --tx-depth=100 --size=1024 --iters=100000

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 100

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

local address: LID 0x04 QPN 0x02be PSN 0xd3bf9c

remote address: LID 0x08 QPN 0x0342 PSN 0x13b557

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

1024 100000 0.00 2842.02 2.910229

Running my own code posted above, I am getting a lower throughput compared to the ib_send_bw results:

./ib_server 1024 10000 msg

*** 10000 MESSAGE_SEND resulted in an average latency of 8.50us ***

114.918097 MB/sec

I analyzed the time for various sections of my code. The ibv_poll_cq sections are consuming over 99% of the execution time and return 0 (no work completion) most of the time. I suspect that something is not configured correctly and adds further processing time to each send or/and (?) receive request put to the queue. But I wasn’t able to figure out the exact cause so far.

I would appreciate it if someone of the community could take a look at my code and point out any issues with using libibverbs incorrectly/inefficiently or improper configured parameters that cause this performance loss. If you need more data about my setup or any other information that helps analyzing this issue, please let me know and I am glad to provide them.

Topic		Replies	Views
ib_send_bw performance puzzle Mellanox OFED iterations , bytes	4	3302	April 27, 2016
I am trying to run four instances of ib_send_bw with UD between two computers with four MCX515A-CCAT NICs installed in each. I can run two instances. The third fails with an error. Software And Drivers ethernet , adapters-and-cables , rhel	3	1086	November 7, 2019
Is this the best our FDR adapters can do? InfiniBand/VPI Adapter Cards iterations , bytes	3	664	August 18, 2016
MLNX_OFED_LINUX-2.4-1.0.4 - test problems Mellanox OFED iterations , bytes	2	530	March 26, 2015
Ib_write_bw test - why sweeping through all sizes is retricted to 8MB? Mellanox OFED	2	198	September 10, 2024
ib_send_bw failing on CX5 port. Software And Drivers tx , port	2	1991	June 9, 2020
Dual-port RDMA Throughput Issue	0	155	April 10, 2018
Pinging IPoIB address fails Mellanox OFED	2	738	April 5, 2018
ib_send_bw ignores '-d 'option on client side,why? Software And Drivers iterations , bytes , tx	1	1039	December 23, 2018
ib_send_bw test with ConnectX-3 VPI adapter card over XenServer 6.2 fails with message: Failed to modify QP 100 to RTR Virtualization For Infiniband And Ethernet iterations , bytes , tx , qp	2	1356	December 10, 2018

Own libibverbs application similar to ib_send_bw with low throughput

Related topics