When I execute hca_self_test.ofed for testing configure of Infiniband, but I got the Error Counter Check on CA #0 (HCA) as following. I tried to reboot the machine, but this error was not removed.
$ sudo /usr/bin/hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 1
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... OFED-internal-5.8-1.0.1: 4.15.0-200-generic
Host Driver RPM Check .................. PASS
Firmware on CA #0 HCA .................. v12.27.1016
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 1
Port State of Port #1 on CA #0 (HCA)..... UP 4X FDR (InfiniBand)
Error Counter Check on CA #0 (HCA)...... FAIL
REASON: found errors in the following counters
Errors in /sys/class/infiniband/mlx5_0/ports/1/counters
port_rcv_errors: 320
port_rcv_switch_relay_errors: 320
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (HCA) ............... ec:0d:9a:03:00:c5:db:c0
------------------ DONE ---------------------
HCA I used is 88:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
. The the detail information of my HCA is as follows.
$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.27.1016
Hardware version: 0
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 4
LMC: 0
SM lid: 1
Link layer: InfiniBand
I am at a bit of a loss and any help would be appreciated.