ib_send_bw performance puzzle

I tried play with ib_send_bw with different message sizes. I noticed that the best performance can be reached if the size is power of two. I tried different size with 1 byte smaller or 1 byte larger, the results are surprisingly low. For example, for message size of 65536 byte, ib_send_bw can get ~ 97Gbps. For 65535 or 65537, the throughput drops between 20~25Gbps. Please see the attached file.

My server setup:

OS: ubuntu Linux 12.04

OFED: MLNX_OFED_LINUX-3.2-2.0.0.0-ubuntu12.04-x86_64

NIC: Mellanox Connect X-4 VPI dual port NIC MCX456A-ECAT

Connection: 100GbE by Mellanox 100GbE switch SN2700

ib.pdf (4.08 KB)

Hello Weijia,

I’m not seeing performance drops on different byte size. Please see below.

[root@mti-mar-s5 ~]# ib_send_bw -s 65535 --report_gbits


  • Waiting for client to connect… *


Send BW Test

Dual-port : OFF Device : mlx5_1

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 4096[B]

Link type : Ethernet

Gid index : 0

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0000 QPN 0x01e5 PSN 0xd8cff4

GID: 00:00:00:00:00:00:00:00:00:00:255:255:12:12:12:05

remote address: LID 0000 QPN 0x01e5 PSN 0xe4ea8a

GID: 00:00:00:00:00:00:00:00:00:00:255:255:12:12:12:06


#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

Conflicting CPU frequency values detected: 1200.042000 != 1244.515000. CPU Frequency is not max.

65535 1000 0.00 97.64 0.186239


[root@mti-mar-s5 ~]# ib_send_bw -s 65536 --report_gbits


  • Waiting for client to connect… *


Send BW Test

Dual-port : OFF Device : mlx5_1

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 4096[B]

Link type : Ethernet

Gid index : 0

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0000 QPN 0x01e6 PSN 0x87adfd

GID: 00:00:00:00:00:00:00:00:00:00:255:255:12:12:12:05

remote address: LID 0000 QPN 0x01e6 PSN 0x8fcb08

GID: 00:00:00:00:00:00:00:00:00:00:255:255:12:12:12:06


#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

Conflicting CPU frequency values detected: 1200.132000 != 1268.234000. CPU Frequency is not max.

65536 1000 0.00 97.66 0.186275


[root@mti-mar-s5 ~]# ib_send_bw -s 65537 --report_gbits


  • Waiting for client to connect… *


Send BW Test

Dual-port : OFF Device : mlx5_1

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 4096[B]

Link type : Ethernet

Gid index : 0

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0000 QPN 0x01e7 PSN 0x4d3748

GID: 00:00:00:00:00:00:00:00:00:00:255:255:12:12:12:05

remote address: LID 0000 QPN 0x01e7 PSN 0x84c3b

GID: 00:00:00:00:00:00:00:00:00:00:255:255:12:12:12:06


#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

Conflicting CPU frequency values detected: 1199.953000 != 1225.828000. CPU Frequency is not max.

65537 1000 0.00 97.51 0.185982


[root@mti-mar-s5 ~]#

Cheers,

~Rage

I had a similar problem with FDR and two adapters being available at the system level… try to ‘disable’ one of the adapters, you can do that with:

# lspci | grep Mellanox → obtain the ID (supposed that the ID is 05:00.0 and 81:00.0)

# vi /etc/udev/rules.d/88-infiniband-remove-adapter.rules

ACTION==“add”, KERNEL==“0000:05:00.0”, SUBSYSTEM==“pci”, RUN+="/bin/sh -c ‘echo 0 > /sys/bus/pci/devices/0000:05:00.0/remove’"

ACTION==“add”, KERNEL==“0000:81:00.0”, SUBSYSTEM==“pci”, RUN+="/bin/sh -c ‘echo 1 > /sys/bus/pci/devices/0000:81:00.0/remove’"

reboot the server and hopefully you will get only one adapter available, and then try to repeat the tests and see if it works fine. By the way, did you try to configure Bonding? If so, can you share the configuration of the /etc/network/interfaces file?

Also, look at:

/sys/module/ib_ipoib/parameters/recv_queue_size

/sys/module/ib_ipoib/parameters/send_queue_size

to see if you have 512 or 128 bytes, you should have at least 512. If not, change that on the following file (/etc/modprobe.d/ib_ipoib.conf) just comment the options line. Did you try to use datagram mode? with an MTU of 2044

Sure, the syntax is very simple:

Server:

ib_send_bw -s 1025

Client:

ib_send_bw -s 1025

Please see the screen shot attached.

By the way, here is the output of ibv_devinfo. By default, the ib_send_bw will choose mlx5_1. However, even I use “-d mlx5_0” or “-d mlx5_1” to specify either port, the same problem exists.

hca_id: mlx5_1

transport: InfiniBand (0)

fw_ver: 12.14.2036

node_guid: 7cfe:9003:0032:797b

sys_image_guid: 7cfe:9003:0032:797a

vendor_id: 0x02c9

vendor_part_id: 4115

hw_ver: 0x0

board_id: MT_2190110032

phys_port_cnt: 1

Device ports:

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 4096 (5)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

hca_id: mlx5_0

transport: InfiniBand (0)

fw_ver: 12.14.2036

node_guid: 7cfe:9003:0032:797a

sys_image_guid: 7cfe:9003:0032:797a

vendor_id: 0x02c9

vendor_part_id: 4115

hw_ver: 0x0

board_id: MT_2190110032

phys_port_cnt: 1

Device ports:

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 4096 (5)

sm_lid: 1

port_lid: 1

port_lmc: 0x00

link_layer: InfiniBand

Hello Weijia,

Please provide exact syntax used with results for reproduction purpose. Perhaps, it’s unique to your environment.

Cheers,

~Rage