ConnectX-6 Lx NMLX_ERR Miss counters detected

Hello,
One of our ESXI servers has failed with these errors, in a VSAN RDMA enabled environment:

vmkernel: cpu103:2098374)<NMLX_ERR> nmlx5_core: 0000:a1:00.1: Health: Miss counters detected
vmkernel: cpu89:10107206)<NMLX_ERR> nmlx5_core: vmnic5: nmlx5_en_EcnProtocolQuery - (nmlx5_core_en_ecn.c:247) nmlx5_QueryPortCong
Status failed: 195887328 protocol R_ROCE_RP
vmkernel: cpu89:10107206)<NMLX_ERR> nmlx5_core: vmnic5: nmlx5_en_EcnProtocolQuery - (nmlx5_core_en_ecn.c:270) done, status: Failure
vmkernel: cpu89:10107206)<NMLX_ERR> nmlx5_core: core: nmlx5_GetECNCap - (nmlx5_core_main.c:281) Fail to query NMLX5_CONG_PROTOCOL
_R_ROCE_RP (Failure)
vmkernel: cpu86:10107235)<NMLX_ERR> nmlx5_core: vmnic5: nmlx5_en_EcnProtocolQuery - (nmlx5_core_en_ecn.c:247) nmlx5_QueryPortCong
Status failed: 195887328 protocol R_ROCE_RP
vmkernel: cpu86:10107235)<NMLX_ERR> nmlx5_core: vmnic5: nmlx5_en_EcnProtocolQuery - (nmlx5_core_en_ecn.c:270) done, status: Failure
vmkernel: cpu86:10107235)<NMLX_ERR> nmlx5_core: core: nmlx5_GetECNCap - (nmlx5_core_main.c:281) Fail to query NMLX5_CONG_PROTOCOL

Hypervisor: VMware ESXi, 8.0.2, 23305546

Adapter Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
Name vmnic5
Location PCI 0000:a1:00.1
Driver nmlx5_core

esxcli software vib list | grep nmlx5
nmlx5-cc 4.23.0.66-2vmw.802.0.0.22380479 VMW VMwareCertified 2023-11-13 host
nmlx5-core 4.23.0.66-2vmw.802.0.0.22380479 VMW VMwareCertified 2023-11-13 host
nmlx5-rdma 4.23.0.66-2vmw.802.0.0.22380479 VMW VMwareCertified 2023-11-13 host

Any idea on how to prevent this issue in the future?

Thanks

Is the CX6 LX @ the latest FW? 26.41.1000?

Is ECN currently enabled?

IE:

esxcli mellanox uplink ecn rRoceNp enable -u vmnic<#>
esxcli mellanox uplink ecn rRoceRp enable -u vmnic<#>

Are you seeing ECN statistics?
esxcli mellanox uplink ecn statistics get -u vmnic<<#>

Have you followed instructions from the Native/Inbox ESXI driver 4.23.0.66 UM to properly configured ROCEv2/lossless point to point?

Hello,
We have FW version 26.35.10.12 because as according to VMware HCL, we need to have 26.34.1002 or 26.35.1012 on VSAN 8.0 U2.

The ECN is currently not enabled.

Yes, we followed them.

Thank you