A client can’t establish connection to server, I don’t have a snapshot.
the OS is a vmware client. we expose the two connectx-5 cards to it, the mlx5_0 is ok, only mlx5_1 hang.
here is the snapshot take from dmesg of our program,
we found our program hang at start up, and it hang at ibv_reg_mr, then we test use ib_send_bw it also hang.
after reboot the vmware host machine, the problem is solved.
I suspect there’s a cqe lost in your startup.
Issue can caused by PCIE HW issue or wrong vm PCIE config.
Could you try test this without VM(same test in hypervisor) to confirm whether we have issue in PCIE HW part?