we are configuring ”nvme over rdma“.
We can use nvme discover to discover remote nvme normally.
But when use nvme connect,we met an error.
by using linux command dmesg, get information
[5365309.262528]mlx5_cmd_check:810:(pid 923941): create_mkey(0x200) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x35e6ec)
[5365309.262539]nvme nvme1: failed to initialize pi mr pool sized 128 for qid 1
[5365309.262551] nvme nvme1: rdma connection establishment failed (-22)
I am also facing the same problem when configuring NVMeOF.
GPUDirect Storage without network (just NVMe) works well.
My Linux kernel version is 5.4.0-100-generic with Ubuntu 20.04.2 LTS. I both tried MLNX_OFED_LINUX-5.8-1.0.1.1-ubuntu20.04-x86_64 and MLNX_OFED_LINUX-5.4-3.7.5.0-ubuntu20.04-x86_64 and didn’t work.
Are you configured in IB mode or RoCE mode? There might be something blocking IB from working. Please configure in RoCE mode and see if it works for you.