Hello
I have a HDR unmanged switch and some devices on and facing some errors i dont really know to interpret or solve.
Switch: MQM8790-HS2F
Cards in Device: ConnectX-6
I installed everything fresh and still face the problems, servers are running ubuntu 22.04LTS, OFED 5.8-2.0.3.0-LTS.
IPoIB just configured like this for example:
ibp33s0:
critical: false
addresses:
- 192.168.124.1/22
Now when i do ibqueryerrors i get following output, with the counter growing in big intervalls every second, i just cleared the errors and counters before cause it even wen to overflow.
root@telly101:/sbdata# ibqueryerrors
Errors for XXXXXXXXXX66 "Quantum Mellanox Technologies"
GUID XXXXXXXXXX66 port ALL: [VL15Dropped == 1046 (1.021K)]
GUID XXXXXXXXXX66 port 41: [VL15Dropped == 1128 (1.102K)]
## Summary: 9 nodes checked, 1 bad nodes found
## 49 ports checked, 1 ports have errors beyond threshold
## Thresholds:
## Suppressed:
also rarly this error comes up for every port where a device is connected. Just werent in the first output cause of reseting. Its allways the GUID.
GUID XXXXXXXXXXXXXXXX66 port 11: [PortRcvSwitchRelayErrors == 5 (5.000)]
GUID XXXXXXXXXXXXXXXX66 port 12: [PortRcvSwitchRelayErrors == 5 (5.000)]
GUID XXXXXXXXXXXXXXXX66 port 13: [PortRcvSwitchRelayErrors == 5 (5.000)]
GUID XXXXXXXXXXXXXXXX66 port 14: [PortRcvSwitchRelayErrors == 5 (5.000)]
GUID XXXXXXXXXXXXXXXX66 port 15: [PortRcvSwitchRelayErrors == 5 (5.000)]
GUID XXXXXXXXXXXXXXXX66 port 16: [PortRcvSwitchRelayErrors == 5 (5.000)]
Opensm is running on one of the servers without giving me any errors.
Someone has an idea?
Cheers
Kilian