HELP, I've posted and Send Request and it wasn't completed with a corresponding Work Completion. What happened?

hello ,

I had a programming problem. The server side of the program will execute ibv_ post_ send, return success. Every time I run for about an hour, there will always be a post after waiting long enough, I still haven’t received the post’s send_ complete event. I would like to ask, is there any good way to debug, or possible causes of this problem?

server side:

dirver:MLNX_OFED_LINUX-4.3-1.0.1.0

hardware: Mellanox Technologies MT27640 Family

Versions: Current Available

FW 16.26.1040 N/A

PXE 3.5.0803 N/A

UEFI 14

Regards,

longfei

Hi,

Do you run ibv_post_send and ibv_poll_cq serially or in differents threads ?

Did you check if you didn’t receive IBV_EVENT_CQ_ERR that it causes by a CQ overrun, in case your CQ cannot be used anymore .

Regards

Marc

Thank you very much for your response。

ibv_post_send and ibv_poll_cq in differents threads and not received IBV_EVENT_CQ_ERR or any other err events from CM and any async events​ from ibv_get_async_event。

Regards,

longfei​

Hi,

Marc, very thanks.

​Can you provide a reference process for handling disconnect? To avoid mem-leak send post, my process is like this:

call rdma_disconnect

–>wait CM DISCONNEC_ EVENT

–>send last_ Post identifies the last send_ post

–>poll_cq threads get last_ post

— >safely destroy qp/cm_ id

I’m using SRQ, but I don’t deal with recv CQE at present. Any suggests?