When use nvme connect,we met an issue “mlx5_cmd_check:810:(pid 923941): create_mkey(0x200) op_mod(0x0)

Describe the bug

we are configuring ”nvme over rdma“.
We can use nvme discover to discover remote nvme normally.
But when use nvme connect,we met an error.
by using linux command dmesg, get information

[5365309.262528]mlx5_cmd_check:810:(pid 923941): create_mkey(0x200) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x35e6ec)
[5365309.262539]nvme nvme1: failed to initialize pi mr pool sized 128 for qid 1 
[5365309.262551] nvme nvme1: rdma connection establishment failed (-22)

follow below to check

https://enterprise-support.nvidia.com/s/article/howto-configure-nvme-over-fabrics

We followed the above command, but we met an issue,when nvme connect

I am also facing the same problem when configuring NVMeOF.
GPUDirect Storage without network (just NVMe) works well.

My Linux kernel version is 5.4.0-100-generic with Ubuntu 20.04.2 LTS. I both tried MLNX_OFED_LINUX-5.8-1.0.1.1-ubuntu20.04-x86_64 and MLNX_OFED_LINUX-5.4-3.7.5.0-ubuntu20.04-x86_64 and didn’t work.

My version is the same as your Linux kernel version,have you tried other version?

No. I have only tried changing MLNX_OFED versions (which was not the solution).

which MLNX &kernel version did you deploy ? didyou successed?

I configured according to the official url, but I encountered the same problem and could not solve it. Do you know how to solve it

Are you configured in IB mode or RoCE mode? There might be something blocking IB from working. Please configure in RoCE mode and see if it works for you.

Please check known issue #3735400 here:
https://docs.nvidia.com/networking/display/mlnxofedv24070610/known+issues

Try the suggested solution and let me know if it helps.

for ib rdma, if you don’t need pi support, disable it in nvme_rdma_configure_admin_queue()
by always setting pi_capable = false