Rdma infiniband cannot open hosts (iberror: discovery failed) Port state: Down

I am facing an issue while configuring rdma and Infiniband on my two nodes. Both of these two nodes are connected and I have installed the recommended software libraries and packages required.
But my port status is down and physical state is Disabled. I tried to enable the state but I get the error of can’t open MAD PORT

ibwarn: [5630] mad_rpc_open_port: can’t open UMAD port ((null):0)

src/ibnetdisc.c:784; can’t open MAD port ((null):0)

/usr/sbin/ibnetdiscover: iberror: failed: discover failed

I have also tried running commands as sudo but still I face the error. Can you guys guide me of what could be the issue ?

Here is my ib_status output:

Infiniband device ‘mlx5_0’ port 1 status:
default gid: fe80:0000:0000:0000:1270:fdff:fe6e:43e0
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 100 Gb/sec (4X EDR)
link_layer: Ethernet

I figured it out and I am sharing the answers for others to see so the issue was the network interface, you need to see which network interface the Infiniband and check the status.

root@dtn0:~# /etc/init.d/openibd status

 HCA driver loaded

 Configured Mellanox EN devices:
 ens11np0

 Currently active Mellanox devices:


 The following OFED modules are loaded:

 rdma_ucm
 rdma_cm
 ib_ipoib
 mlx5_core
  mlx5_ib
   ib_uverbs
   ib_umad
   ib_cm
   ib_core
   mlxfw

After that, I just assigned Ip and netmask on the interface and I was able to use the interface and reach the network.

root@dtn0:~# ifconfig ens11np0 10.0.0.50/24

1 Like

hi,baka_laowai.I am sorry to trouble. I faced the same problem with you and I have tried to assigned Ip and netmask on the interface, but the port status is still down and physical state is disabled. What’s more, I also tried to enable the state but the MAD PORT can’t open. Could you help me please?
Following are more details:

/etc/init.d/openibd status

HCA driver loaded

Configured Mellanox EN devices:
enp33s0np0

Currently active Mellanox devices:
enp33s0np0

The following OFED modules are loaded:

ib_ipoib
mlx5_core
mlx5_ib
ib_uverbs
ib_umad
ib_cm
ib_core
mlxfw

ibstatus

Infiniband device ‘mlx5_0’ port 1 status:
default gid: fe80:0000:0000:0000:526b:4bff:fe28:4fd0
base lid: 0x0
sm lid: 0x0
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: Ethernet

wrong message:
ibwarn: [329771] mad_rpc_open_port: can’t open UMAD port ((null):1)
ibping: iberror: failed: Failed to open ‘(null)’ port ‘1’