Hello there,
Trying to run nvme discover to NVME target.
nvme utility freeze for a some time and reports
#nvme discover -t rdma -a 172.16.0.35
Failed to write to /dev/nvme-fabrics: I/O Error
failed to add controller, error failed to write to nvme-fabrics device
dmesg reports:
nvme nvme2: I/O tag 0 (0000) opcode 0x7f (Fabrics Cmd) QID 0 timeout
nvme nvme2: Connect command failed, error wo/DNR bit: 881
nvme nvme2: failed to connect queue: 0 ret=881
Networks works fine, target got pings:
#ping 172.16.0.35
PING 172.16.0.35 (172.16.0.35) 56(84) bytes of data.
64 bytes from 172.16.0.35: icmp_seq=1 ttl=63 time=0.174 ms
64 bytes from 172.16.0.35: icmp_seq=2 ttl=63 time=0.122 ms
^C
— 172.16.0.35 ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 1056ms
rtt min/avg/max/mdev = 0.122/0.148/0.174/0.026 ms
I have two more identical boxes, they work fine, nvme discover and nvme connect with no errors.
lspci -v | grep Mellanox
01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: Mellanox Technologies ConnectX-5 Ex EN network interface card, 100GbE dual-port QSFP28, PCIe4.0 x16, tall bracket; MCX516A-CDAT
01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: Mellanox Technologies ConnectX-5 Ex EN network interface card, 100GbE dual-port QSFP28, PCIe4.0 x16, tall bracket; MCX516A-CDAT
21:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: Mellanox Technologies ConnectX-5 Ex EN network interface card, 100GbE dual-port QSFP28, PCIe4.0 x16, tall bracket; MCX516A-CDAT
21:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
# ethtool -i ens4f0np0
driver: mlx5_core
version: 6.12.34-6.12-alt1
firmware-version: 16.35.4030 (MT_0000000013)
expansion-rom-version:
bus-info: 0000:21:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
lsmod | grep nvme
nvme_rdma 49152 0
nvme_fabrics 36864 1 nvme_rdma
rdma_cm 155648 6 rpcrdma,ib_srpt,nvme_rdma,ib_iser,ib_isert,rdma_ucm
ib_core 516096 13 rdma_cm,ib_ipoib,rpcrdma,ib_srpt,nvme_rdma,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
#lsmod | grep rdma
nvme_rdma 49152 0
nvme_fabrics 36864 1 nvme_rdma
rpcrdma 454656 0
sunrpc 843776 1 rpcrdma
rdma_ucm 32768 0
rdma_cm 155648 6 rpcrdma,ib_srpt,nvme_rdma,ib_iser,ib_isert,rdma_ucm
iw_cm 61440 1 rdma_cm
ib_cm 155648 3 rdma_cm,ib_ipoib,ib_srpt
ib_uverbs 200704 2 rdma_ucm,mlx5_ib
ib_core 516096 13 rdma_cm,ib_ipoib,rpcrdma,ib_srpt,nvme_rdma,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
lsmod | grep mlx5
mlx5_ib 491520 0
ib_uverbs 200704 2 rdma_ucm,mlx5_ib
ib_core 516096 13 rdma_cm,ib_ipoib,rpcrdma,ib_srpt,nvme_rdma,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx5_core 2666496 1 mlx5_ib
psample 16384 2 openvswitch,mlx5_core
tls 151552 2 bonding,mlx5_core
pci_hyperv_intf 12288 1 mlx5_core
Any ideas would be greatly appreciated, including hints for debugging