Getting lot of "CQE error - vendor syndrome: 0x69 syndrome: 0x1" errors, transmission interrupted.

hello!

On, ConnectX®-3 Pro EN I receive large number of “CQE error - vendor syndrome: 0xf9 syndrome: 0x5” logs in kernel log. They start with one “CQE error - vendor syndrome: 0x69 syndrome: 0x1” and then lots of 0xf9 and 0x5 follows. And the transmission is interrupted. What I can do to debug or fix that ?

The configuration looks like - Linux 4.15, Ubuntu Xenial with 4.0 kernel driver (tried OFED 4.7 - the same), firmware version 2.40.5000, pcie x8.

any advices will be much appreciated!

Hello Pawel,

Many thanks for posting your inquiry on the Mellanox Community.

Based on the information provided, syndrome 0x5 points to “Work Request Flushed Error”.

For further debugging the issue, we recommend to open a Mellanox Support Ticket (Valid Support contract needed) so we can go deeper into the debug of this issue. You can open a ticket through support@mellanox.com

Also we noticed that your f/w is version 2.40.5000. The validated and supported version for the ConnectX-3 Pro EN in combination with MLXN_OFED version 4.7, is version 2.42.5000.

You can dowload the latest version through the following download link → https://www.mellanox.com/page/firmware_table_ConnectX3ProEN

Many thanks,

~Mellanox Technical Support

Thanks for your answer!

We advised customer to contact OEM provider as it’s AOC form nic card from SuperMicro.

Installed latest 2.42.5000 provider by the OEM vendor and problem still persists.

Can you explain a bit what that errors mean ? What’s the area where we should focus to find solution ?