I am facing an issue while configuring rdma and Infiniband on my two nodes. Both of these two nodes are connected and I have installed the recommended software libraries and packages required.
But my port status is down and physical state is Disabled. I tried to enable the state but I get the error of can’t open MAD PORT
ibwarn: [5630] mad_rpc_open_port: can’t open UMAD port ((null):0)
src/ibnetdisc.c:784; can’t open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
I have also tried running commands as sudo but still I face the error. Can you guys guide me of what could be the issue ?
Here is my ib_status output:
Infiniband device ‘mlx5_0’ port 1 status:
default gid: fe80:0000:0000:0000:1270:fdff:fe6e:43e0
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 100 Gb/sec (4X EDR)
link_layer: Ethernet
I figured it out and I am sharing the answers for others to see so the issue was the network interface, you need to see which network interface the Infiniband and check the status.
root@dtn0:~# /etc/init.d/openibd status
HCA driver loaded
Configured Mellanox EN devices:
ens11np0
Currently active Mellanox devices:
The following OFED modules are loaded:
rdma_ucm
rdma_cm
ib_ipoib
mlx5_core
mlx5_ib
ib_uverbs
ib_umad
ib_cm
ib_core
mlxfw
After that, I just assigned Ip and netmask on the interface and I was able to use the interface and reach the network.
root@dtn0:~# ifconfig ens11np0 10.0.0.50/24
1 Like
hi,baka_laowai.I am sorry to trouble. I faced the same problem with you and I have tried to assigned Ip and netmask on the interface, but the port status is still down and physical state is disabled. What’s more, I also tried to enable the state but the MAD PORT can’t open. Could you help me please?
Following are more details:
/etc/init.d/openibd status
HCA driver loaded
Configured Mellanox EN devices:
enp33s0np0
Currently active Mellanox devices:
enp33s0np0
The following OFED modules are loaded:
ib_ipoib
mlx5_core
mlx5_ib
ib_uverbs
ib_umad
ib_cm
ib_core
mlxfw
ibstatus
Infiniband device ‘mlx5_0’ port 1 status:
default gid: fe80:0000:0000:0000:526b:4bff:fe28:4fd0
base lid: 0x0
sm lid: 0x0
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: Ethernet
wrong message:
ibwarn: [329771] mad_rpc_open_port: can’t open UMAD port ((null):1)
ibping: iberror: failed: Failed to open ‘(null)’ port ‘1’