MLNX_OFED_LINUX-2.4-1.0.4 - test problems

Hello,

I have downloaded yesterday the most recent of Mellanox OFED (MLNX_OFED_LINUX-2.4-1.0.4 (OFED-2.4-1.0.4)) for our new ConnectX-4 cards and I am encountering problems when trying to run some tests with following tools: ib_read_bw ib_read_lat ib_send_bw ib_send_lat ib_write_bw ib_write_lat .

Here are the error messages that I am getting for ib_write_bw test, but it is the same for other benchmarks:

[aotto@lab16 ~]$ib_write_bw


  • Waiting for client to connect… *


RDMA_Write BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

CQ Moderation : 100

Mtu : 4096[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x02 QPN 0x002f PSN 0xc34b21 RKey 0x009f76 VAddr 0x007f11f9990000

remote address: LID 0x03 QPN 0x002f PSN 0x6b9d23 RKey 0x009363 VAddr 0x007fa679400000


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

ethernet_read_keys: Couldn’t read remote address

Unable to read to socket/rdam_cm

Failed to exchange data between server and clients

[aotto@lab17 ~]$ ib_write_bw -a lab16


RDMA_Write BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 4096[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x03 QPN 0x002f PSN 0x6b9d23 RKey 0x009363 VAddr 0x007fa679400000

remote address: LID 0x02 QPN 0x002f PSN 0xc34b21 RKey 0x009f76 VAddr 0x007f11f9990000


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

2 5000 11.40 9.82 5.147955

4 5000 34.56 34.47 9.036457

8 5000 68.84 67.93 8.904261

16 5000 137.69 135.93 8.908591

32 5000 276.46 275.92 9.041272

64 5000 555.11 553.14 9.062630

128 5000 1105.84 1101.91 9.026843

256 5000 2203.01 2195.79 8.993952

512 5000 4272.00 4090.32 8.376984

1024 5000 6551.23 6244.85 6.394722

2048 5000 7748.52 7602.42 3.892439

4096 5000 8075.72 8067.92 2.065387

8192 5000 10593.14 10491.92 1.342965

16384 5000 10475.85 10382.35 0.664471

32768 5000 10469.75 10464.48 0.334863

65536 5000 10534.17 10533.56 0.168537

mlx5: lab17: got completion with error:

00000000 00000000 00000000 00000000

00000000 00000000 00000000 00000000

00000000 00000000 00000000 00000000

00000000 00008813 0800002f 408079d1

Problems with warm up

The test always stops when trying to send more than 65536 bytes. Do you know what can be a problem and its solution? Can it be related to drivers?

If you need some more information, let me know so I can post them.

Thank you very much for your help.

Cheers,

Adam

Hi Adam,

Please open a ticket with support@mellanox.com mailto:support@mellanox.com

In this particular use-case, when testing with pertest tools with lowest to highest message size, users should put -a flag on both the sender and receiver.