Hello,
I have two identical servers out of which one is having problems with the RDMA. The problem was already described well in the ConnectX-5 25GbE missing RDMA devices - Adapters and Cables / Ethernet Adapter Cards - NVIDIA Developer Forums
But, the problem was not solved in that topic. I have Mellanox card with Part Number MCX623106PE-CDAT, so I am using the firmware 22.37.1014 (MT_0000000606) and driver 5.8-1.1.2. This should be according to the recommendation. But it still does not work.
I even replaced it with the Mellanox ConnectX-6 with Part Number 0F6FXM (DELL) and the problem is the same.
The strange thing is that it is working on the other server and it worked on the “faulty” server before with the same configuration. I have a fresh Ubuntu 22.04 (5.15.0-75-generic) installation, so there is nothing installed that could prevent it from working.
Any suggestions on how to fix this?