Soft RoCE not working (no errors)


I am attempting to run soft RoCE and interface with a X4 card in a different computer.

I am running CentOS 7.4, and I installed MLNX_OFED version 4.2- The install finished without error, and I ran the service restart command when prompted. I proceed to try and setup soft RoCE following the directions here: HowTo Configure Soft-RoCE .

When I run rxe_cfg status/start the script complains that the rdma_rxe module is not loaded (and no other errors even in verbose mode). When I run run lsmod | grep rdma_rxe, I see that rdma_rxe is in fact loaded loaded, and that it is using mlx_compat. Small variation from the above instructions on my system - rdma_rxe is using mlx_compat, not ib_core (even though ib_core is loaded and used by mlx_compat). I figured this is some wrapper used by Mellanox in newer version of the OFED. I have even tried running modprobe rdma_rxe and see no error messages in loading rdma_rxe, and dmesg does not show any error messages from the kernel. I have also tried reloading the module and restarting the machine.

After ‘starting’ rxe_cfg, doing rxe_cfg add <adapter_name> does nothing. It does load any IB devices associated with the NIC, and I still see the ‘rdma_rxe module is not loaded’ message.

I looked around a bunch and could not find anything which helped. I have also tried the same stuff with version 4.2- of MLNX_OFED. This computer did have a X4 card in it when I first installed the OFED package. I took it out in case it was preventing soft RoCE from working on other NICs, restarted, re-installed OFED, and did the same troubleshooting without the Mellanox card in.

Any help would be appreciated.

​I don’t know how helpful this might be, but I’ve seen notes where the soft-roce and the hardware driver are mutually exclusive. Try removing/uninstalling the driver for the card (not just yanking the card) and then see if soft-roce configures/runs.