Mlx5_core CQE error on kernel log - vendor syndrome 0xf9

Hello,

I’m getting a lot of kernel error logs while forwarding packets using AF_XDP in ZEROCOPY mode in ConnectX-5 NICs & it seems to be a lot of packet loss.

NIC: Mellanox Technology MT27800 Family [ConnextX-5]
firmware: 16.35.1012
Processor: AMD EPYC 7452 32-core Processor
Distribution:Debian 12rc3
Kernel: 6.1.0-9-amd64

kernel logs for your reference,

[Thu May 25 03:42:53 2023] mlx5_core 0000:81:00.1 enp129s0f1np1: Error cqe on cqn 0xc53, ci 0x362, qn 0x4b8e, opcode 0xd, syndrome 0x5, vendor syndrome 0xf9
[Thu May 25 03:42:53 2023] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000030: 00 00 00 00 45 00 f9 05 00 00 4b 8e e0 7e 50 d3
[Thu May 25 03:42:53 2023] mlx5_core 0000:81:00.1 enp129s0f1np1: Error cqe on cqn 0xc53, ci 0x363, qn 0x4b8e, opcode 0xd, syndrome 0x5, vendor syndrome 0xf9
[Thu May 25 03:42:53 2023] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000030: 00 00 00 00 45 00 f9 05 00 00 4b 8e e0 7f 50 d3
[Thu May 25 03:42:53 2023] mlx5_core 0000:81:00.1 enp129s0f1np1: Error cqe on cqn 0xc53, ci 0x364, qn 0x4b8e, opcode 0xd, syndrome 0x5, vendor syndrome 0xf9
[Thu May 25 03:42:53 2023] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000030: 00 00 00 00 45 00 f9 05 29 00 4b 8e e0 80 81 d3
[Thu May 25 03:42:53 2023] mlx5_core 0000:81:00.1 enp129s0f1np1: Error cqe on cqn 0xc53, ci 0x365, qn 0x4b8e, opcode 0xd, syndrome 0x5, vendor syndrome 0xf9
[Thu May 25 03:42:53 2023] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Thu May 25 03:42:53 2023] 00000030: 00 00 00 00 45 00 f9 05 29 00 4b 8e e0 8d 8d d3

Hi Gopinath,

Thank you for posting your query on NVIDIA community.

Based on the information shared so far, it is unclear if this issue is experienced when using MLNX OFED driver or not. If not in use, I would like to request installing the MLNX OFED driver based on the supported OS mentioned at —> Linux InfiniBand Drivers

Debian 12 is currently not supported.

If you experience issues after using a supported OS and MLNX OFED driver, I would like to request opening a support ticket by emailing to Networking-support@nvidia.com in order to perform additional debug. Please note, a valid support contract is needed for opening support ticket. The contracts team can be reached on Networking-contracts@nvidia.com

Thanks,
Namrata.