Hello,
I am attempting to run soft RoCE and interface with a X4 card in a different computer.
I am running CentOS 7.4, and I installed MLNX_OFED version 4.2-1.2.0.0. The install finished without error, and I ran the service restart command when prompted. I proceed to try and setup soft RoCE following the directions here: HowTo Configure Soft-RoCE https://community.mellanox.com/s/article/howto-configure-soft-roce .
When I run rxe_cfg status/start the script complains that the rdma_rxe module is not loaded (and no other errors even in verbose mode). When I run run lsmod | grep rdma_rxe, I see that rdma_rxe is in fact loaded loaded, and that it is using mlx_compat. Small variation from the above instructions on my system - rdma_rxe is using mlx_compat, not ib_core (even though ib_core is loaded and used by mlx_compat). I figured this is some wrapper used by Mellanox in newer version of the OFED. I have even tried running modprobe rdma_rxe and see no error messages in loading rdma_rxe, and dmesg does not show any error messages from the kernel. I have also tried reloading the module and restarting the machine.
After ‘starting’ rxe_cfg, doing rxe_cfg add <adapter_name> does nothing. It does load any IB devices associated with the NIC, and I still see the ‘rdma_rxe module is not loaded’ message.
I looked around a bunch and could not find anything which helped. I have also tried the same stuff with version 4.2-1.0.0.0 of MLNX_OFED. This computer did have a X4 card in it when I first installed the OFED package. I took it out in case it was preventing soft RoCE from working on other NICs, restarted, re-installed OFED, and did the same troubleshooting without the Mellanox card in.
Any help would be appreciated.