I am trying to set up RDMA connection between the sub OS of smartnic in host server A and connectX 5 in host server B,and the overview of total architecture is shown in following figure.
After configuring the ip of connectx6 and connectx5, sub OS B and conncetX5 can ping to each other. However, when tring to test RDMA connection with ibv_rc_pingpong and ib_send_bw, some error occurs. I have read some documents but still don’t know how to fix this issue.
As for ibv_rc_pingpong, the output is:
hostB:
ibv_rc_pingpong -d mlx5_1 -g 0
local address: LID 0x0000, QPN 0x000a2c, PSN 0x1561b0, GID fe80::9a03:9bff:feaf:c63f
remote address: LID 0x0000, QPN 0x0004c7, PSN 0x9dd50f, GID fe80::e:cff:fe74:cc0d
sub OS B in smartnic:
ibv_rc_pingpong -d mlx5_3 -g 0 192.168.100.7
local address: LID 0x0000, QPN 0x0004c7, PSN 0x9dd50f, GID fe80::e:cff:fe74:cc0d
remote address: LID 0x0000, QPN 0x000a2c, PSN 0x1561b0, GID fe80::9a03:9bff:feaf:c63f
Failed status transport retry counter exceeded (12) for wr_id 2
parse WC failed 1
As for ib_send_bw:
host B
sudo ib_send_bw -a -c UD -d mlx5_1 -i 1
- Waiting for client to connect… *
Max msg size in UD is MTU 1024
Changing to this MTU
Send BW Test
Dual-port : OFF Device : mlx5_1
Number of qps : 1 Transport type : IB
Connection type : UD Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
RX depth : 1000
CQ Moderation : 100
Mtu : 1024[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
Found Incompatibility issue with GID types.
Please Try to use a different IP version.
hostA
sudo ib_send_bw -a -c UD -d mlx5_3 -i 1 192.168.100.7
Max msg size in UD is MTU 1024
Changing to this MTU
Send BW Test
Dual-port : OFF Device : mlx5_3
Number of qps : 1 Transport type : IB
Connection type : UD Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 100
Mtu : 1024[B]
Link type : Ethernet
GID index : 1
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
Found Incompatibility issue with GID types.
Please Try to use a different IP version.
The environment is list as follows:
Host B:
MLNX_OFED_LINUX-5.1-2.3.7.1 (OFED-5.1-2.3.7)
enp130s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.100.7 netmask 255.0.0.0 broadcast 192.255.255.255
inet6 fe80::9a03:9bff:feaf:c63f prefixlen 64 scopeid 0x20
ether 98:03:9b:af:c6:3f txqueuelen 1000 (Ethernet)
RX packets 339 bytes 33516 (33.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2054 bytes 455475 (455.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.28.2006
node_guid: 9803:9b03:00af:c63e
sys_image_guid: 9803:9b03:00af:c63e
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 16.28.2006
node_guid: 9803:9b03:00af:c63f
sys_image_guid: 9803:9b03:00af:c63e
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
sub OS B in smartnic
MLNX_OFED_LINUX-5.1-2.3.0 (OFED-5.1-2.3.0)
p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.100.13 netmask 255.0.0.0 broadcast 192.255.255.255
ether 0c:42:a1:a4:8a:15 txqueuelen 1000 (Ethernet)
RX packets 1187 bytes 355265 (346.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 100 bytes 9058 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ibv_devinfo
hca_id: mlx5_2
transport: InfiniBand (0)
fw_ver: 24.28.2006
node_guid: 02aa:8cff:fed4:152e
sys_image_guid: 0c42:a103:00a4:8a10
vendor_id: 0x02c9
vendor_part_id: 41686
hw_ver: 0x0
board_id: MT_0000000477
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: mlx5_3
transport: InfiniBand (0)
fw_ver: 24.28.2006
node_guid: 020e:0cff:fe74:cc0d
sys_image_guid: 0c42:a103:00a4:8a10
vendor_id: 0x02c9
vendor_part_id: 41686
hw_ver: 0x0
board_id: MT_0000000477
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
besides, it is strange that mlx5_1 and mlx5_0 in smartnic don’t have gid while mlx5_3 and mlx 5_2 has gids.
show_gids mlx5_1
DEV PORT INDEX GID IPv4 VER DEV
n_gids_found=0