Unable to set up RDMA connection between ConnectX-5 and ConnectX-6 on smartnic

I am trying to set up RDMA connection between the sub OS of smartnic in host server A and connectX 5 in host server B,and the overview of total architecture is shown in following figure.

cap

After configuring the ip of connectx6 and connectx5, sub OS B and conncetX5 can ping to each other. However, when tring to test RDMA connection with ibv_rc_pingpong and ib_send_bw, some error occurs. I have read some documents but still don’t know how to fix this issue.

As for ibv_rc_pingpong, the output is:

hostB:

ibv_rc_pingpong -d mlx5_1 -g 0

local address: LID 0x0000, QPN 0x000a2c, PSN 0x1561b0, GID fe80::9a03:9bff:feaf:c63f

remote address: LID 0x0000, QPN 0x0004c7, PSN 0x9dd50f, GID fe80::e:cff:fe74:cc0d

sub OS B in smartnic:

ibv_rc_pingpong -d mlx5_3 -g 0 192.168.100.7

local address: LID 0x0000, QPN 0x0004c7, PSN 0x9dd50f, GID fe80::e:cff:fe74:cc0d

remote address: LID 0x0000, QPN 0x000a2c, PSN 0x1561b0, GID fe80::9a03:9bff:feaf:c63f

Failed status transport retry counter exceeded (12) for wr_id 2

parse WC failed 1

As for ib_send_bw:

host B

sudo ib_send_bw -a -c UD -d mlx5_1 -i 1


  • Waiting for client to connect… *

Max msg size in UD is MTU 1024

Changing to this MTU


Send BW Test

Dual-port : OFF Device : mlx5_1

Number of qps : 1 Transport type : IB

Connection type : UD Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : ON

RX depth : 1000

CQ Moderation : 100

Mtu : 1024[B]

Link type : Ethernet

GID index : 3

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


Found Incompatibility issue with GID types.

Please Try to use a different IP version.

​hostA

​sudo ib_send_bw -a -c UD -d mlx5_3 -i 1 192.168.100.7

Max msg size in UD is MTU 1024

Changing to this MTU


Send BW Test

Dual-port : OFF Device : mlx5_3

Number of qps : 1 Transport type : IB

Connection type : UD Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : ON

TX depth : 128

CQ Moderation : 100

Mtu : 1024[B]

Link type : Ethernet

GID index : 1

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


Found Incompatibility issue with GID types.

Please Try to use a different IP version.

The environment is list as follows:

Host B:

MLNX_OFED_LINUX-5.1-2.3.7.1 (OFED-5.1-2.3.7)

enp130s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 192.168.100.7 netmask 255.0.0.0 broadcast 192.255.255.255

inet6 fe80::9a03:9bff:feaf:c63f prefixlen 64 scopeid 0x20

ether 98:03:9b:af:c6:3f txqueuelen 1000 (Ethernet)

RX packets 339 bytes 33516 (33.5 KB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 2054 bytes 455475 (455.4 KB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

hca_id: mlx5_0

transport: InfiniBand (0)

fw_ver: 16.28.2006

node_guid: 9803:9b03:00af:c63e

sys_image_guid: 9803:9b03:00af:c63e

vendor_id: 0x02c9

vendor_part_id: 4119

hw_ver: 0x0

board_id: MT_0000000008

phys_port_cnt: 1

port: 1

state: PORT_DOWN (1)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

hca_id: mlx5_1

transport: InfiniBand (0)

fw_ver: 16.28.2006

node_guid: 9803:9b03:00af:c63f

sys_image_guid: 9803:9b03:00af:c63e

vendor_id: 0x02c9

vendor_part_id: 4119

hw_ver: 0x0

board_id: MT_0000000008

phys_port_cnt: 1

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

sub OS B in smartnic

MLNX_OFED_LINUX-5.1-2.3.0 (OFED-5.1-2.3.0)

p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 192.168.100.13 netmask 255.0.0.0 broadcast 192.255.255.255

ether 0c:42:a1:a4:8a:15 txqueuelen 1000 (Ethernet)

RX packets 1187 bytes 355265 (346.9 KiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 100 bytes 9058 (8.8 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ibv_devinfo

hca_id: mlx5_2

transport: InfiniBand (0)

fw_ver: 24.28.2006

node_guid: 02aa:8cff:fed4:152e

sys_image_guid: 0c42:a103:00a4:8a10

vendor_id: 0x02c9

vendor_part_id: 41686

hw_ver: 0x0

board_id: MT_0000000477

phys_port_cnt: 1

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

hca_id: mlx5_3

transport: InfiniBand (0)

fw_ver: 24.28.2006

node_guid: 020e:0cff:fe74:cc0d

sys_image_guid: 0c42:a103:00a4:8a10

vendor_id: 0x02c9

vendor_part_id: 41686

hw_ver: 0x0

board_id: MT_0000000477

phys_port_cnt: 1

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

besides, it is strange that mlx5_1 and mlx5_0 in smartnic don’t have gid while mlx5_3 and mlx 5_2 has gids.

show_gids mlx5_1

DEV PORT INDEX GID IPv4 VER DEV


n_gids_found=0

Smartnic has real physcial port and responser port to host, please make sure you use real port.

Your test use differnet GID index one is 1 other is 3 that will cause, the ROCE version different on each other, please use same GID index by -x.