Both computers are running RHEL 7.6. All 8 NICs are configured the same way.
On the server, in 3 separate windows I am running the following commands:
$ ib_send_bw -d mlx5_0 -c UD --report_gbits -i 1 -F --run_infinitely
$ ib_send_bw -d mlx5_1 -c UD -p 18520 --report_gbits -i 1 -F --run_infinitely
$ ib_send_bw -d mlx5_2 -c UD -p 18521 --report_gbits -i 1 -F --run_infinitely
On the client, the commands being run are:
$ ib_send_bw -d mlx5_0 -c UD -i 1 -F --report_gbits --run_infinitely 10.10.10.3
$ ib_send_bw -d mlx5_1 -c UD -i 1 -F -p 18520 --report_gbits --run_infinitely 10.10.10.4
$ ib_send_bw -d mlx5_2 -c UD -i 1 -F -p 18521 --report_gbits --run_infinitely 10.10.10.5
The first two instances run fine. The third instance fails with the following outputs.
server:
$ ib_send_bw -d mlx5_2 -c UD -p 18521 --report_gbits -i 1 -F --run_infinitely
- Waiting for client to connect… *
Send BW Test
Dual-port : OFF Device : mlx5_2
Number of qps : 1 Transport type : IB
Connection type : UD Using SRQ : OFF
RX depth : 1000
CQ Moderation : 100
Mtu : 4096[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
local address: LID 0000 QPN 0x010f PSN 0xc8809d
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:10:05
remote address: LID 0000 QPN 0x0111 PSN 0x7a219e
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:10:09
ethernet_read_keys: Couldn’t read remote address
Unable to read to socket/rdam_cm
Failed to exchange data between server and clients
Client:
$ ib_send_bw -d mlx5_2 -c UD -i 1 -F -p 18521 --report_gbits --run_infinitely 10.10.10.5
Send BW Test
Dual-port : OFF Device : mlx5_2
Number of qps : 1 Transport type : IB
Connection type : UD Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
local address: LID 0000 QPN 0x0111 PSN 0x7a219e
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:10:09
remote address: LID 0000 QPN 0x010f PSN 0xc8809d
GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:10:05
libibverbs: resolver: Neighbour doesn’t have a hw addr
libibverbs: resolver: Unspecific failurelibibverbs: Neigh resolution process failed
Failed to create AH for UD
Unable to Connect the HCA’s through the link
Any thoughts or suggestions would be most appreciated.
Thanks,
Terry