Can't use RDMA on Azure NC24rs_v3

I tried to use infiniband on Standard_NC24rs_v3 Ubuntu 16.04 , but failed. I installed

https://www.mellanox.com/downloads/ofed/MLNX_OFED-4.7-3.2.9.0/MLNX_OFED_LINUX-4.7-3.2.9.0-ubuntu18.04-x86_64.tgz from https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/hpc/enable-infiniband

When I test ib_write_bw, it reports errors.

perfkit@pkb-61175e23-0:~$ ib_write_bw


  • Waiting for client to connect… *


RDMA_Write BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

CQ Moderation : 1

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x57 QPN 0xb1603 PSN 0x4d16b6 RKey 0x78010800 VAddr 0x007f1b9e28a000

remote address: LID 0x6c QPN 0xb943f PSN 0x618c07 RKey 0xa00108b9 VAddr 0x007f0a88ff8000


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

ethernet_read_keys: Couldn’t read remote address

Unable to read to socket/rdam_cm

Failed to exchange data between server and clients

perfkit@pkb-61175e23-1:~$ ib_write_bw pkb-61175e23-0


RDMA_Write BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 1

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x6c QPN 0xb943f PSN 0x618c07 RKey 0xa00108b9 VAddr 0x007f0a88ff8000

remote address: LID 0x57 QPN 0xb1603 PSN 0x4d16b6 RKey 0x78010800 VAddr 0x007f1b9e28a000


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

Completion with error at client

Failed status 12: wr_id 0 syndrom 0x81

scnt=128, ccnt=0

Failed to complete run_iter_bw function successfully

Hello Hao,

Thank you for posting your question on the Mellanox Community. We also noticed that you opened a Mellanox Support ticket as you have a valid support contract.

We will assist you further through the Mellanox Support ticket opened.

Thank you,

~Mellanox Technical Support

Hello Hao, Martijn,

I’d like to ask if you are able to please share the solution which you came up with. I’ve encountered the exactly the same error during this test.

I’m using two ConnectX-4 Lx cards (installed on separate machines in the same LAN).

Any tips, advice would be much appreciated!

Hamed

ib_send_bw -x 4 192.168.1.2


Send BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 1024[B]

Link type : Ethernet

Gid index : 4

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0000 QPN 0x0702 PSN 0xca8817

GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:03

remote address: LID 0000 QPN 0x066a PSN 0xcb0f9c

GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:02


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

Completion with error at client

Failed status 12: wr_id 0 syndrom 0x81

scnt=128, ccnt=0