INFINIBAND RDMA_CM_EVENT_ADDR_ERROR

Hello currently I am using Mellanox ConnectX-3 Adapter for test

currently the pingpong test that was included in the Mellanox install package (ibv_rc_pingpong) are working

However the tests such as rping and udaddy that were mentioned in the post HowTo Enable, Verify and Troubleshoot RDMA

None of the tests will run

here are the error result below

sungho@c1n15:~$ udaddy -s 172.23.10.30 │sungho@c1n14:~$

udaddy: starting client │sungho@c1n14:~$

udaddy: connecting │sungho@c1n14:~$ udaddy

udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -19 │udaddy: starting server

test complete │

return status -19

I have two servers running connected with a switch,

and the infiniband ethernets are all pingable with each other

and all the ethernets are installed and running

However I have doubts about the arp table

because it doesn’t seem to look like to be connected properly. (listed below)

here is the information of the two servers below

Do you think I need to statistically add the arp table? or is there something fundamentally wrong?

server (A)

sungho@c1n14:/usr/bin$ ibstat

CA ‘mlx4_0’

CA type: MT4099

Number of ports: 1

Firmware version: 2.42.5000

Hardware version: 1

Node GUID: 0x7cfe9003009a7c30

System image GUID: 0x7cfe9003009a7c33

Port 1:

State: Active

Physical state: LinkUp

Rate: 56

Base lid: 3

LMC: 0

SM lid: 3

Capability mask: 0x0251486a

Port GUID: 0x7cfe9003009a7c31

Link layer: InfiniBand

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 172.23.1.1 0.0.0.0 UG 0 0 0 enp1s0f0

172.23.0.0 0.0.0.0 255.255.0.0 U 0 0 0 enp1s0f0

172.23.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib0

sungho@c1n14:/usr/bin$ arp -n

Address HWtype HWaddress Flags Mask Iface

172.23.10.1 ether 0c:c4:7a:3a:35:88 C enp1s0f0

172.23.10.15 ether 0c:c4:7a:3a:35:72 C enp1s0f0

172.23.1.1 ether 00:1b:21:5b:6a:a8 C enp1s0f0

enp1s0f0 Link encap:Ethernet HWaddr 0c:c4:7a:3a:35:70

inet addr:172.23.10.14 Bcast:172.23.255.255 Mask:255.255.0.0

inet6 addr: fe80::ec4:7aff:fe3a:3570/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:12438 errors:0 dropped:5886 overruns:0 frame:0

TX packets:5861 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:2356740 (2.3 MB) TX bytes:836306 (836.3 KB)

ib0 Link encap:UNSPEC HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00

inet addr:172.23.10.30 Bcast:172.23.255.255 Mask:255.255.0.0

inet6 addr: fe80::7efe:9003:9a:7c31/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:8 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:256

RX bytes:0 (0.0 B) TX bytes:616 (616.0 B)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:189 errors:0 dropped:0 overruns:0 frame:0

TX packets:189 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1

RX bytes:13912 (13.9 KB) TX bytes:13912 (13.9 KB)

server (B)

sungho@c1n15:~$ ibstat

CA ‘mlx4_0’

CA type: MT4099

Number of ports: 1

Firmware version: 2.42.5000

Hardware version: 1

Node GUID: 0x7cfe9003009a6360

System image GUID: 0x7cfe9003009a6363

Port 1:

State: Active

Physical state: LinkUp

Rate: 56

Base lid: 1

LMC: 0

SM lid: 3

Capability mask: 0x02514868

Port GUID: 0x7cfe9003009a6361

Link layer: InfiniBand

sungho@c1n15:~$ route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 172.23.1.1 0.0.0.0 UG 0 0 0 enp1s0f0

172.23.0.0 0.0.0.0 255.255.0.0 U 0 0 0 enp1s0f0

172.23.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib0

sungho@c1n15:~$ arp -n

Address HWtype HWaddress Flags Mask Iface

172.23.10.14 ether 0c:c4:7a:3a:35:70 C enp1s0f0

172.23.10.1 ether 0c:c4:7a:3a:35:88 C enp1s0f0

172.23.10.30 ether 0c:c4:7a:3a:35:70 C enp1s0f0

172.23.1.1 ether 00:1b:21:5b:6a:a8 C enp1s0f0

enp1s0f0 Link encap:Ethernet HWaddr 0c:c4:7a:3a:35:72

inet addr:172.23.10.15 Bcast:172.23.255.255 Mask:255.255.0.0

inet6 addr: fe80::ec4:7aff:fe3a:3572/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:19432 errors:0 dropped:5938 overruns:0 frame:0

TX packets:8783 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:8246898 (8.2 MB) TX bytes:1050793 (1.0 MB)

ib0 Link encap:UNSPEC HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00

inet addr:172.23.10.31 Bcast:172.23.255.255 Mask:255.255.0.0

inet6 addr: fe80::7efe:9003:9a:6361/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:16 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:256

RX bytes:0 (0.0 B) TX bytes:1232 (1.2 KB)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:109 errors:0 dropped:0 overruns:0 frame:0

TX packets:109 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1

RX bytes:7992 (7.9 KB) TX bytes:7992 (7.9 KB)

Hi Sungho,

Thank you for posting your question on the Mellanox Community.

In your environment, when using multiple interfaces in the same address range, please bind the address on which you want to run udaddy/rping and / or ib_send_bw

For example:

rping - Server

rping -d -s -a

rping - Client

rping -d -c -a

udaddy - Server

udaddy -b

udaddy - Client

udaddy -b -s

ib_send_bw - Server

ib_send_bw -d -p --report_gbits -R -a -F

Example: # ib_send_bw -d mlx5_0 -p 1 --report_gbits -R -a -F

ib_send_bw - Client

ib_send_bw -d -p --report_gbits -a -R -F

Example: # ib_send_bw -d mlx4_0 -p1 1.1.1.101 --report_gbits -a -R -F

In our lab, we have seen no issues running the above tests. All tests established and confirmed RDMA connectivity.

If you still experiencing issues, running the provided example, we recommend you to open a Support Case with Mellanox Technical Support.

Thanks.

Cheers,

~Martijn