ib_send_bw test with ConnectX-3 VPI adapter card over XenServer 6.2 fails with message: Failed to modify QP 100 to RTR

Hi all,

we are testing a ConnectX-3 card with XenServer 6.2. For that, we have two nodes, the first node has XenServer 6.2 dom0 and the second node has CentOS 6.4. In both nodes we have installed the MLNX_OFED version 2.2-1.0.1.

If we perform a ibping between the two nodes, all seems to be ok. But when we try to perform a ib_send_bw test, we have problems depending who is the server. If the server is the node with CentOS 6.4, the test ends correctly.

Server → CentOS 6.4 and Client → XenServer 6.2 dom 0

(Server output)

-bash-4.1$ ib_send_bw -d mlx4_0

************************************

* Waiting for client to connect… *

************************************

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x13 QPN 0x0063 PSN 0x27cecf

remote address: LID 0x1b QPN 0x0865 PSN 0x54507

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 1000 0.00 5946.02 0.095136

---------------------------------------------------------------------------------------

(Client output)

[root@xenserver ~]# ib_send_bw 192.168.1.14 -d mlx4_0

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x1b QPN 0x0865 PSN 0x54507

remote address: LID 0x13 QPN 0x0063 PSN 0x27cecf

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 1000 5906.23 5904.40 0.094470

---------------------------------------------------------------------------------------

However, if the server is the XenServer 6.2 dom0 node, the test fails:

Server → XenServer 6.2 dom 0 and Client → CentOS 6.4

(Server output)

[root@xenserver ~]# ib_send_bw -d mlx4_0

************************************

* Waiting for client to connect… *

************************************

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x1b QPN 0x0866 PSN 0x301c00

remote address: LID 0x13 QPN 0x0064 PSN 0x86e97e

ethernet_read_keys: Couldn’t read remote address

Unable to read to socket/rdam_cm

Failed to exchange data between server and clients

(Client output)

-bash-4.1$ ib_send_bw 192.168.1.17 -d mlx4_0

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 128[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x13 QPN 0x0064 PSN 0x86e97e

remote address: LID 0x1b QPN 0x0866 PSN 0x301c00

Failed to modify QP 100 to RTR

Unable to Connect the HCA’s through the link

Can anyone give me help on this error?

Did you disable the firewall: SELinux and iptables?

How did you define the Ethernet ports?

I also believe that with regards to CentOS there was a bug in an old perftest version that might be causing this. please try to use a more updated one