Setup is a ConnectX 5 Ex (MCX516A-CDAT) NIC with firmware v16.30.1004 and MellonxOFED 5.3-.1.0.0.1 basic installation, on Ubuntu 20.04.2 LTS, 5.4.0-74-generic, x86_64.
The primary application catches work completion ibv_wc->status
errors after the [ibv_poll_cq](https://www.rdmamojo.com/2013/02/15/ibv_poll_cq/)
call - the QPs are setup with IBV_QPT_RAW_PACKET. There is also a printout seemingly from the driver level itself, all posted below. This primary application receives UDP packets.
In an exemplary ping-pong application (from the examples) no such errors occur. I’ve seen a post that attributed a similar error to the form of the packets themselves.
- If this is the cause here, what do the packets need to comply to?
- If that’s not the case where is the documentation on the vendor error codes and the larger error code that is printed out?
Error printout below: An initial completion error code 0x4, then two 0x10, then 0x5 (this last one repeats indefinitely). The vendor error too jumps around; 0x32, then 0x99 twice then 0xf9 indefinitely. The last 8 hex chars of the mlx5 completion error changes each time.
mlx5: seti-node4: got completion with error:
00000000 00000000 00000000 00000000
00000000 000067ba 07000000 00000000
00000000 20009232 00000000 0000203a
000006c1 920c3204 00000000 000030e0
0: got completion error 0x4 vendor error 0x32 (wr_id 0 qp_num 0)
mlx5: seti-node4: got completion with error:
00000000 00000000 00000000 00000000
00000000 000067ba 07000000 00000000
00000000 20000099 00000000 0000203a
000006c1 000c9922 00000000 000116e0
0: got completion error 0x10 vendor error 0x99 (wr_id 1 qp_num 0)
mlx5: seti-node4: got completion with error:
00000000 00000000 00000000 00000000
00000000 000067ba 07000000 00000000
00000000 20000099 00000000 0000203a
000006c1 000c9922 00000000 000216e0
0: got completion error 0x10 vendor error 0x99 (wr_id 2 qp_num 0)
0: got completion error 0x5 vendor error 0xf9 (wr_id 3 qp_num 4665)
When I use the sender executable of the ping-pong example as the source for the primary application’s packets, the errors are hardly different (this time the packets are not UDP):
mlx5: seti-node4: got completion with error:
00000000 00000000 00000000 00000000
00000000 000067ba 07000000 00000000
00000000 20009232 00000000 00000062
00001ed3 920b3204 00000000 000045e0
0: got completion error 0x4 vendor error 0x32 (wr_id 0 qp_num 0)
mlx5: seti-node4: got completion with error:
00000000 00000000 00000000 00000000
00000000 000067ba 07000000 00000000
00000000 20000099 00000000 00000062
00001ed3 000c9922 00000000 000164e0
0: got completion error 0x10 vendor error 0x99 (wr_id 1 qp_num 0)
mlx5: seti-node4: got completion with error:
00000000 00000000 00000000 00000000
00000000 000067ba 07000000 00000000
00000000 20000099 00000000 00000062
00001ed3 000d9922 00000000 000265e0