In order to learn RDMA, I found an example on the Internet, which is similar to the one provided by MELLANOX , but when I used two machines to run, I found the following problems:
1.There is a big gap between the bandwidth of the code tested and that tested by Perftest .
2.In addition to this, the use of GID 0 or 2 on one of the two machines will significantly reduce the bandwidth.
Machine A:
configure:
hca_id: mlx5_bond_0
transport: InfiniBand (0)
fw_ver: 20.39.3004
node_guid: 1070:fd03:00e5:f118
sys_image_guid: 1070:fd03:00e5:f118
vendor_id: 0x02c9
vendor_part_id: 4123
hw_ver: 0x0
board_id: MT_0000000224
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
DEV PORT INDEX GID IPv4 VER DEV
--- ---- ----- --- ------------ --- ---
mlx5_bond_0 1 0 fe80:0000:0000:0000:b0fc:4eff:feb3:1112 v1 bond0
mlx5_bond_0 1 1 fe80:0000:0000:0000:b0fc:4eff:feb3:1112 v2 bond0
mlx5_bond_0 1 2 0000:0000:0000:0000:0000:ffff:0a77:2e3d 10.119.46.61 v1 bond0
mlx5_bond_0 1 3 0000:0000:0000:0000:0000:ffff:0a77:2e3d 10.119.46.61 v2 bond0
test in perftest on GID 1
---------------------------------------------------------------------------------------
RDMA_Read BW Test
RX depth: 1
post_list: 1
inline_size: 0
Dual-port : OFF Device : mlx5_bond_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
CQ Moderation : 1
Mtu : 1024[B]
Link type : Ethernet
GID index : 1
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x1659 PSN 0xd4858a OUT 0x10 RKey 0x203e00 VAddr 0x007f38d0d07000
GID: 254:128:00:00:00:00:00:00:176:252:78:255:254:179:17:18
remote address: LID 0000 QPN 0x1c86 PSN 0xc2e51a OUT 0x10 RKey 0x013f00 VAddr 0x007f123fc62000
GID: 254:128:00:00:00:00:00:00:100:155:154:255:254:172:09:41
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MiB/sec] BW average[MiB/sec] MsgRate[Mpps]
65536 1000 10829.53 10829.17 0.173267
---------------------------------------------------------------------------------------
Machine B:
hca_id: mlx5_bond_0
transport: InfiniBand (0)
fw_ver: 20.39.3004
node_guid: e8eb:d303:0032:b212
sys_image_guid: e8eb:d303:0032:b212
vendor_id: 0x02c9
vendor_part_id: 4123
hw_ver: 0x0
board_id: MT_0000000224
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
DEV PORT INDEX GID IPv4 VER DEV
--- ---- ----- --- ------------ --- ---
mlx5_bond_0 1 0 fe80:0000:0000:0000:649b:9aff:feac:0929 v1 bond0
mlx5_bond_0 1 1 fe80:0000:0000:0000:649b:9aff:feac:0929 v2 bond0
mlx5_bond_0 1 2 0000:0000:0000:0000:0000:ffff:0a77:2e3e 10.119.46.62 v1 bond0
mlx5_bond_0 1 3 0000:0000:0000:0000:0000:ffff:0a77:2e3e 10.119.46.62 v2 bond0
n_gids_found=4
test in perftest on GID 0
RDMA_Read BW Test
RX depth: 1
post_list: 1
inline_size: 0
Dual-port : OFF Device : mlx5_bond_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
CQ Moderation : 1
Mtu : 1024[B]
Link type : Ethernet
GID index : 1
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x1659 PSN 0xd4858a OUT 0x10 RKey 0x203e00 VAddr 0x007f38d0d07000
GID: 254:128:00:00:00:00:00:00:176:252:78:255:254:179:17:18
remote address: LID 0000 QPN 0x1c86 PSN 0xc2e51a OUT 0x10 RKey 0x013f00 VAddr 0x007f123fc62000
GID: 254:128:00:00:00:00:00:00:100:155:154:255:254:172:09:41
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MiB/sec] BW average[MiB/sec] MsgRate[Mpps]
65536 1000 10829.53 10829.17 0.173267
---------------------------------------------------------------------------------------
If I test on the example code, the Bandwidth is about 0.0124GB/s when M1 use GID0 and M2 use GID0/GID1. And the Bandwidth is about 6GB/s when M1 use GID1 and M2 use GID1. I’d like to know what optimizations the perftest code has done, or what deficiencies the code in the example above has caused a big difference in the bandwidth of the tests.