We have a 6-node Windows Server 2022 Failover cluster
We use 2x Mellanox ConnectX-6 Lx adapters in each server purely for iSCSI connection to our SAN. We’re getting the following event on 2 out of the 6 servers:
ConnectX-6 Lx #4 device reports an “EQ stuck” on EQn 0x4. Attempting recovery.
I can’t seem to link this to any particular workload or action on the servers, and we don’t notice any undesired effect. A previous thread suggests it could be related to CPU utilisation or RSS configuration.
CPU utilisation is generally low (~10-15%) with RSS configured:
MaxProcessors - 4
NumberOfReceiveQueues - 8
Profile - NUMAStatic
RSS processor array is different for each interface
Any suggestions as to what could be causing this would be greatly appreciated.