Running MLX5 cards on CentOS 8.1 and am seeing the following errors (thousands) when running an fio test from an NFS client to the NFS server. NFS is v4 and NFSoRDMA is being used. Cards (ConnectX6) have latest firmware installed. The following errors are from the dmesg output in both the client and server:
Kernel Version: 4.18.0-147.el8.x86_64
=============
mst status -v
MST modules:
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
DEVICE_TYPE MST PCI RDMA NET NUMA
ConnectX6(rev:0) /dev/mst/mt4123_pciconf1 af:00.0 mlx5_1 net-ib1 1
ConnectX6(rev:0) /dev/mst/mt4123_pciconf0 3b:00.0 mlx5_0 net-ib0 0
===============
MLNX_OFED_LINUX-5.0-2.1.8.0
=================
[Sun Jul 26 20:40:45 2020] infiniband mlx5_0: dump_cqe:286:(pid 28589): dump error cqe
[Sun Jul 26 20:40:45 2020] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000030: 00 00 00 00 00 00 d7 01 00 02 ae 2e 00 02 bf e3
[Sun Jul 26 20:40:45 2020] infiniband mlx5_0: dump_cqe:286:(pid 28589): dump error cqe
[Sun Jul 26 20:40:45 2020] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000030: 00 00 00 00 00 00 d7 01 00 02 ae 2f 00 02 2b e3
[Sun Jul 26 20:40:45 2020] infiniband mlx5_0: dump_cqe:286:(pid 28589): dump error cqe
[Sun Jul 26 20:40:45 2020] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[Sun Jul 26 20:40:45 2020] 00000030: 00 00 00 00 00 00 d7 01 00 02 ae 30 00 02 8d e2