always got follow error. sw/hw info attached
Aug 30 11:12:53 MM kernel: [323391.181538] mlx5_core 0000:81:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xfca97000 flags=0x0020]
Aug 30 11:12:54 MM kernel: [323392.574617] mlx5_core 0000:81:00.0: wait_func_handle_exec_timeout:1104:(pid 105326): cmd[0]: 2RST_QP(0x50a) No done completion
Aug 30 11:12:54 MM kernel: [323392.574626] mlx5_core 0000:81:00.0: wait_func:1132:(pid 105326): 2RST_QP(0x50a) timeout.Will cause a leak of a command resource
Aug 30 11:12:54 MM kernel: [323392.574638] infiniband mlx5_1: destroy_qp_common:2625:(pid 105326): mlx5_ib: modify QP 0x0002f1 to RESET failed
Aug 30 11:12:54 MM kernel: [323392.574974] mlx5_core 0000:81:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xfc771800 flags=0x0000]
Aug 30 11:12:54 MM kernel: [323392.576051] mlx5_core 0000:81:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xfc771840 flags=0x0000]
Aug 30 11:12:54 MM kernel: [323392.576913] mlx5_core 0000:81:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xfc771980 flags=0x0000]
Aug 30 11:12:55 MM kernel: [323393.274775] mlx5_core 0000:81:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xfca97000 flags=0x0020]
Aug 30 11:12:55 MM kernel: [323393.278706] mlx5_core 0000:81:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xfca97040 flags=0x0020]
Aug 30 11:12:55 MM kernel: [323393.758637] amd_iommu_report_page_fault: 127 callbacks suppressed
Aug 30 11:12:55 MM kernel: [323393.758640] mlx5_core 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xea831400 flags=0x0000]
Aug 30 11:12:55 MM kernel: [323393.759408] mlx5_core 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0061 address=0xea831400 flags=0x0000]
$ lspci | grep -i Mellanox
81:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
81:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
c1:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
# ethtool -I eth3
driver: mlx5_core
version: 5.4-1.0.3
firmware-version: 16.31.1014 (MT_0000000010)
expansion-rom-version:
bus-info: 0000:c1:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
# uname -a
Linux MM-IDC-CPU-10-50-1-60 5.4.0-84-generic #94~18.04.1-Ubuntu SMP Thu Aug 26 23:17:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux