How to identify different ERR CQE messages and solve?

Hello am looking for more information on CX7 ERR CQE messages and how to solve? Appreciate the help!

When bringing up interface

root@localhost:~/ats_stuff# ifconfig eth1 up
root@localhost:~# [ 604.043519] mlx5_core 0003:01:00.0 eth1: Error cqe on cqn 0x45b, ci 0x0, qn 0x167, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 604.055160] mlx5_core 0003:01:00.0 eth1: ERR CQE on SQ: 0x167
[ 604.090332] mlx5_core 0003:01:00.0 eth1: Error cqe on cqn 0x45b, ci 0x1, qn 0x167, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 604.101966] mlx5_core 0003:01:00.0 eth1: ERR CQE on SQ: 0x167

When passing traffic

oot@localhost:~/ats_stuff# ./iperf3_netns_stress.sh eth1 eth2 -n 4
creating netns iperf_server_eth1
creating netns iperf_client_eth2
adding netdev eth1 to netns iperf_server_eth1
adding netdev eth2 to netns iperf_client_eth2
[ 1029.319999] mlx5_core 0003:01:00.0 eth1: Error cqe on cqn 0x465, ci 0x0, qn 0x2ed, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1029.323059] mlx5_core 0003:01:00.1 eth2: Error cqe on cqn 0x146e, ci 0x0, qn 0x3ab, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1029.343359] mlx5_core 0003:01:00.0 eth1: ERR CQE on SQ: 0x2ed
[ 1029.343361] mlx5_core 0003:01:00.1 eth2: ERR CQE on SQ: 0x3ab
[ 1029.487043] mlx5_core 0003:01:00.0 eth1: Error cqe on cqn 0x465, ci 0x1, qn 0x2ed, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1029.498680] mlx5_core 0003:01:00.0 eth1: ERR CQE on SQ: 0x2ed
[ 1029.943108] mlx5_core 0003:01:00.1 eth2: Error cqe on cqn 0x146e, ci 0x1, qn 0x3ab, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1029.954832] mlx5_core 0003:01:00.1 eth2: ERR CQE on SQ: 0x3ab
[ 1030.231200] mlx5_core 0003:01:00.1 eth2: Error cqe on cqn 0x146e, ci 0x2, qn 0x3ab, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1030.242922] mlx5_core 0003:01:00.1 eth2: ERR CQE on SQ: 0x3ab
1: lo: mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 94:6d:ae:38:84:7e brd ff:ff:ff:ff:ff:ff
inet 192.168.100.10/24 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::966d:aeff:fe38:847e/64 scope link
valid_lft forever preferred_lft forever
1: lo: mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 94:6d:ae:38:84:7f brd ff:ff:ff:ff:ff:ff
inet 192.168.100.11/24 scope global eth2
valid_lft forever preferred_lft forever
inet6 fe80::966d:aeff:fe38:847f/64 scope link
valid_lft forever preferred_lft forever
spawning 4 iperf server processes
spawning 4 iperf client processes




Server listening on 5204
Server listening on 5203
Server listening on 5202
Server listening on 5201




[ 1038.423717] mlx5_core 0003:01:00.1 eth2: Error cqe on cqn 0x1469, ci 0x0, qn 0x3a5, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1038.435450] mlx5_core 0003:01:00.1 eth2: ERR CQE on SQ: 0x3a5
[ 1054.776620] mlx5_core 0003:01:00.1 eth2: Error cqe on cqn 0x1482, ci 0x0, qn 0x3c3, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1054.788357] mlx5_core 0003:01:00.1 eth2: ERR CQE on SQ: 0x3c3
[ 1070.905664] mlx5_core 0003:01:00.1 eth2: Error cqe on cqn 0x1473, ci 0x0, qn 0x3b1, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[ 1070.917394] mlx5_core 0003:01:00.1 eth2: ERR CQE on SQ: 0x3b1
iperf3: error - unable to send control message: Bad file descriptor
iperf3: error - unable to send control message: Bad file descriptor
iperf3: error - unable to send control message: Bad file descriptor
iperf3: error - unable to send control message: Bad file descriptor

What’s the CX7 PN and fw version?

In early version of firmware there is some issue. Please use latest firmware.

Thank you Xiofengl,

I updated fw as follows. I performed the same commands stated (ifconfig up and iperf) with exactly the same error codes presented.

Is there anything else I can provide to help troubleshoot?

Do you have descriptions of the various “vendor syndrome codes?”

===

root@localhost:~/ats_stuff/fw# flint -d 0003:01:00.0 -i fw-ConnectX7-rel-28_41_1000-MCX713106AS-VEA_Ax-UEFI-14.34.12-FlexBoot-3.7.400.signed.bin

Current FW version on flash:  28.40.1000
New FW version:               28.41.1000

FSMST_INITIALIZE - OK
Writing Boot image component - OK
Restoring signature - OK
-I- To load new FW run mlxfwreset or reboot machine.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.