Hello!
I have two of identical nodes with Mellanox CX-5 (MCX516A-CDAT) NICs
All networks settings are identical, I’ve changed and swapped everything
:
# cat /etc/os-release
PRETTY_NAME=“Debian GNU/Linux 13 (trixie)”
NAME=“Debian GNU/Linux”
VERSION_ID=“13”
VERSION=“13 (trixie)”
VERSION_CODENAME=trixie
DEBIAN_VERSION_FULL=13.4
ID=debian
HOME_URL=“https://www.debian.org/”
SUPPORT_URL=“https://www.debian.org/support”
BUG_REPORT_URL=“https://bugs.debian.org/”
#uname -r #PVE kernel
6.17.13-2-pve
#ethtool -i
# ethtool -i ens4f0np0
driver: mlx5_core
version: 6.17.13-2-pve
firmware-version: 16.35.8008 (MT_0000000013)
expansion-rom-version:
bus-info: 0000:21:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
# ethtool -i ens9f0np0
driver: mlx5_core
version: 6.17.13-2-pve
firmware-version: 16.35.8008 (MT_0000000013)
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
set up nvme kernel debug:
echo "module rdma_cm +p" > /sys/kernel/debug/dynamic_debug/control
echo "module nvme_rdma +p" > /sys/kernel/debug/dynamic_debug/control
echo "module mlx5_ib +p" > /sys/kernel/debug/dynamic_debug/control
at another session starts
dmesg -w
try to do nvme discover by issuing
nvme discober -t rdma -a 172.16.0.35
dmesg session shows on working node:
[ 309.553040] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x406
[ 309.553685] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x407
[ 309.554249] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x408
[ 309.554785] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x409
[ 309.555255] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x40a
[ 309.555723] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x40b
[ 309.556268] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x40c
[ 309.556863] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x40d
[ 309.557349] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x40e
[ 309.557800] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x40f
[ 309.558289] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x410
[ 309.558781] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x411
[ 309.559267] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x412
[ 309.559748] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x413
[ 309.560279] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x414
[ 309.560977] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x415
[ 309.561469] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x416
[ 309.561957] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x417
[ 309.562483] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x418
[ 309.562974] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x419
[ 309.563459] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x41a
[ 309.563941] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x41b
[ 309.564475] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x41c
[ 309.564971] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x41d
[ 309.565457] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x41e
[ 309.565932] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x41f
[ 309.566453] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x420
[ 309.566943] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x421
[ 309.567430] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x422
[ 309.567950] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x423
[ 309.568641] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x424
[ 309.569124] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x425
[ 309.569608] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x426
[ 309.570086] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x427
[ 309.570579] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x428
[ 309.571064] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x429
[ 309.571530] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x42a
[ 309.572002] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x42b
[ 309.572508] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x42c
[ 309.572987] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x42d
[ 309.573453] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x42e
[ 309.573942] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x42f
[ 309.574481] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x430
[ 309.574966] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x431
[ 309.575432] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x432
[ 309.575905] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x433
[ 309.576437] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x434
[ 309.576916] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x435
[ 309.577421] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x436
[ 309.577893] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x437
[ 309.578401] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x438
[ 309.578880] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x439
[ 309.579399] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x43a
[ 309.579863] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x43b
[ 309.580372] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x43c
[ 309.580853] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x43d
[ 309.581324] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x43e
[ 309.581806] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x43f
[ 309.582345] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x440
[ 309.582817] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x441
[ 309.583404] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x442
[ 309.583870] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x443
[ 309.584394] infiniband rocep33s0f0: mlx5_ib_create_cq:1034:(pid 2163): cqn 0x444
[ 309.584401] infiniband rocep33s0f0: calc_sq_size:601:(pid 2163): wqe_size 256
[ 309.585843] infiniband rocep33s0f0: create_qp:3149:(pid 2163): QP type 2, ib qpn 0x133D, mlx qpn 0x133d, rcqn 0x406, scqn 0x406, ece 0x0
[ 309.585854] infiniband rocep33s0f0: get_tx_affinity:4061:(pid 2163): Set tx affinity 0x2 to qpn 0x133d
[ 309.595898] infiniband rocep33s0f0: poll_soft_wc:595:(pid 1327): polled software generated completion on CQ 0x402
[ 309.597544] infiniband rocep33s0f0: poll_soft_wc:595:(pid 1327): polled software generated completion on CQ 0x402
[ 309.598285] nvme nvme0: queue_size 128 > ctrl sqsize 64, clamping down
[ 309.598296] nvme nvme0: new ctrl: NQN “nqn.2014-08.org.nvmexpress.discovery”, addr 172.16.0.35:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:cb8002e8-0929-4b34-aefa-6ec66ebfc1a4
[ 309.598891] nvme nvme0: Removing ctrl: NQN “nqn.2014-08.org.nvmexpress.discovery”
[ 309.625907] infiniband rocep33s0f0: poll_soft_wc:595:(pid 1327): polled software generated completion on CQ 0x402
[ 309.626261] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626266] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf5
[ 309.626396] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626400] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626402] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626404] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626406] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626407] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626409] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626411] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626413] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626414] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626416] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626418] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626419] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626421] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626423] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626424] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626426] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626428] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626429] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626431] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626433] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626434] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626436] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626438] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626439] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626441] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626443] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626444] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626446] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626448] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626449] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626451] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626454] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626455] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626457] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626459] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626460] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626462] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626464] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626465] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626467] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626469] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626470] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626472] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626474] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626475] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626477] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626479] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626480] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626482] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626484] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626485] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626487] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626489] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626490] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626492] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626494] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626495] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626497] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626499] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626500] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626502] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
[ 309.626504] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Requestor error cqe on cqn 0x406:
[ 309.626505] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf5
[ 309.626606] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406:
[ 309.626608] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9
But everything works OK:
Discovery Log Number of Records 1, Generation counter 2
=====Discovery Log Entry 0======
trtype: rdma
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: nqn.2020-02.huawei.nvme:nvm-subsystem-sn-XXXXXXXXXXXXXXXXXXXXXX
traddr: 172.16.0.35
eflags: none
rdma_prtype: roce-v2
rdma_qptype: connected
rdma_cms: rdma-cm
rdma_pkey: 0000
root@node3:~#
A problem node(node1) does not discover nvme target:
root@node1:~# nvme discover -t rdma -a 172.16.0.35 -vv kernel supports: instance cntlid transport traddr trsvcid nqn queue_size nr_io_queues reconnect_delay ctrl_loss_tmo keep_alive_tmo hostnqn host_traddr host_iface hostid duplicate_connect disable_sqflow hdr_digest data_digest nr_write_queues nr_poll_queues tos keyring tls_key fast_io_fail_tmo discovery dhchap_secret dhchap_ctrl_secret tls concat recovery_delay connect ctrl, 'nqn=nqn.2014-08.org.nvmexpress.discovery,transport=rdma,traddr=172.16.0.35,trsvcid=4420,hostnqn=nqn.2014-08.org.nvmexpress:uuid:717a9176-ac73-4ea3-829e-e4ccf0b5735f,hostid=717a9176-ac73-4ea3-829e-e4ccf0b5735f,ctrl_loss_tmo=600' **Failed to write to /dev/nvme-fabrics: Input/output error failed to add controller, error failed to write to nvme-fabrics device**
In dmesg:
[356612.263017] nvme nvme0: I/O tag 0 (0000) opcode 0x7f (Fabrics Cmd) QID 0 timeout
[356612.263054] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[356612.263463] nvme nvme0: failed to connect queue: 0 ret=881
at detailed dmesg log:
[ 4922.196036] nvme nvme0: address resolved (0): status 0 id 000000002ea79baa [ 4922.196396] infiniband rocep33s0f0: calc_sq_size:601:(pid 14440): wqe_size 256 [ 4922.196729] infiniband rocep33s0f0: create_qp:3149:(pid 14440): QP type 2, ib qpn 0x133F, mlx qpn 0x133f, rcqn 0x406, scqn 0x406, ece 0x0 [ 4922.196739] infiniband rocep33s0f0: get_tx_affinity:4061:(pid 14440): Set tx affinity 0x2 to qpn 0x133f [ 4922.205568] nvme nvme0: route resolved (2): status 0 id 000000002ea79baa [ 4922.205636] infiniband rocep33s0f0: poll_soft_wc:595:(pid 1333): polled software generated completion on CQ 0x402 [ 4922.206434] nvme nvme0: established (9): status 0 id 000000002ea79baa [ 4922.206449] infiniband rocep33s0f0: poll_soft_wc:595:(pid 1333): polled software generated completion on CQ 0x402 [ 4929.694236] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Requestor error cqe on cqn 0x406: [ 4929.694245] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x15, vendor syndrome 0x81 [ 4929.694475] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694479] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf4 [ 4929.694552] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694556] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694558] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694559] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694561] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694563] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694565] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694566] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694568] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694570] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694571] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694573] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694575] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694576] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694578] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694580] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694581] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694583] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694585] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694586] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694588] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694590] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694591] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694593] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694595] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694597] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694598] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694600] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694602] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694603] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694605] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694607] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694612] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694614] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694615] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694618] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694619] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694621] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694623] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694625] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694627] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694628] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694630] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694632] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694634] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694636] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694638] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694640] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694642] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694644] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694646] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694648] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694650] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694652] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694654] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694656] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694658] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694659] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694662] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694663] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4929.694665] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4929.694667] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4983.306731] nvme nvme0: I/O tag 0 (0000) opcode 0x7f (Fabrics Cmd) QID 0 timeout [ 4983.306795] nvme nvme0: Connect command failed, error wo/DNR bit: 881 [ 4983.306978] infiniband rocep33s0f0: poll_soft_wc:595:(pid 1333): polled software generated completion on CQ 0x402 [ 4983.307079] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Requestor error cqe on cqn 0x406: [ 4983.307084] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4983.307124] nvme nvme0: disconnected (10): status 0 id 000000002ea79baa [ 4983.307128] nvme nvme0: disconnect received - connection closed [ 4983.307285] infiniband rocep33s0f0: mlx5_poll_one:527:(pid 0): Responder error cqe on cqn 0x406: [ 4983.307287] infiniband rocep33s0f0: mlx5_poll_one:530:(pid 0): syndrome 0x5, vendor syndrome 0xf9 [ 4983.307295] nvme nvme0: failed to connect queue: 0 ret=881
nvme discovery and nvme connect won’t work