Ibv_reg_mr hang

our program hangs at start up, at ibv_reg_mr, and then I test use ib_send_bw, it also hang forever, see picture below

basic environment info list below:
OS: openEuler 20.03

Kernel: 4.19.90-2112.8.0.0131.oel

ofed: MLNX_OFED_LINUX-5.7-1.0.2.0


I already post a thread to linux-rdma mailling list, they say it’s MLNX_OFED issue

This is the server side of the ib_send_bw.

Are you running a client? Please attach its output.

Is this a GPU server?

A client can’t establish connection to server, I don’t have a snapshot.
the OS is a vmware client. we expose the two connectx-5 cards to it, the mlx5_0 is ok, only mlx5_1 hang.
here is the snapshot take from dmesg of our program,


we found our program hang at start up, and it hang at ibv_reg_mr, then we test use ib_send_bw it also hang.
after reboot the vmware host machine, the problem is solved.

hi abbycin

I suspect there’s a cqe lost in your startup.
Issue can caused by PCIE HW issue or wrong vm PCIE config.
Could you try test this without VM(same test in hypervisor) to confirm whether we have issue in PCIE HW part?