Hello,
I have downloaded yesterday the most recent of Mellanox OFED (MLNX_OFED_LINUX-2.4-1.0.4 (OFED-2.4-1.0.4)) for our new ConnectX-4 cards and I am encountering problems when trying to run some tests with following tools: ib_read_bw ib_read_lat ib_send_bw ib_send_lat ib_write_bw ib_write_lat .
Here are the error messages that I am getting for ib_write_bw test, but it is the same for other benchmarks:
[aotto@lab16 ~]$ib_write_bw
- Waiting for client to connect… *
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
local address: LID 0x02 QPN 0x002f PSN 0xc34b21 RKey 0x009f76 VAddr 0x007f11f9990000
remote address: LID 0x03 QPN 0x002f PSN 0x6b9d23 RKey 0x009363 VAddr 0x007fa679400000
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
ethernet_read_keys: Couldn’t read remote address
Unable to read to socket/rdam_cm
Failed to exchange data between server and clients
[aotto@lab17 ~]$ ib_write_bw -a lab16
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
local address: LID 0x03 QPN 0x002f PSN 0x6b9d23 RKey 0x009363 VAddr 0x007fa679400000
remote address: LID 0x02 QPN 0x002f PSN 0xc34b21 RKey 0x009f76 VAddr 0x007f11f9990000
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
2 5000 11.40 9.82 5.147955
4 5000 34.56 34.47 9.036457
8 5000 68.84 67.93 8.904261
16 5000 137.69 135.93 8.908591
32 5000 276.46 275.92 9.041272
64 5000 555.11 553.14 9.062630
128 5000 1105.84 1101.91 9.026843
256 5000 2203.01 2195.79 8.993952
512 5000 4272.00 4090.32 8.376984
1024 5000 6551.23 6244.85 6.394722
2048 5000 7748.52 7602.42 3.892439
4096 5000 8075.72 8067.92 2.065387
8192 5000 10593.14 10491.92 1.342965
16384 5000 10475.85 10382.35 0.664471
32768 5000 10469.75 10464.48 0.334863
65536 5000 10534.17 10533.56 0.168537
mlx5: lab17: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00008813 0800002f 408079d1
Problems with warm up
The test always stops when trying to send more than 65536 bytes. Do you know what can be a problem and its solution? Can it be related to drivers?
If you need some more information, let me know so I can post them.
Thank you very much for your help.
Cheers,
Adam