Need help understanding RoCE packet receive using ConnectX-5 ethernet NIC (How does the firmware work?)

We are porting parts of the Mellanox OFED stack using the Ubuntu 18.04 version 5.1-2.5.8.0 release to another OS using the ConnectX-5 ethernet NIC. We have the following question: How does the NIC firmware direct a RoCE v2 packet, received via the wire interface, to the applicable destination QP? What are the conditions that could cause a RoCE v2 packet to be silently dropped so that it will not show up at the expected/programmed QP?

We have ported ud_pingpoing to our target OS and are able to send RoCE packets to a Linux host configured with OFED 5.1-2.5.8.0. The Linux host receives that packet and responds with a RoCE packet. We are able to see that this packet is received on the target ConnectX-5 NIC from the rx counters. However, the programmed QP and CQ are not detecting this RoCE packet. We suspect that it is being dropped.

The firmware version on our ConnectX-5 cards is: 16.28.2006 and the board_id is MT_0000000012.

I’m also wondering this…doing something similiar. Do you have SF (Scalable Functions) enabled:

https://docs.mellanox.com/display/BlueFieldDPUOSv370/Scalable+Functions

https://docs.mellanox.com/display/BlueFieldSWv36011699/Mediated+Devices (this should really be linked in the page above.)

No, we are not using scalable functions for our port.

Hi Glen,

For RoCE (RDMA_Write/Read) we use RC QP. We establish RC connection. In transmitted packets, related IB headers include QP number (for RC) or DCT/DCR (in case of DC).

For DC we add QP to multicast group. so may be destination QP was not added to multicast group.

Best Regards,

Viki