I had a programming problem. The server side of the program will execute ibv_ post_ send, return success. Every time I run for about an hour, there will always be a post after waiting long enough, I still haven’t received the post’s send_ complete event. I would like to ask, is there any good way to debug, or possible causes of this problem?
hardware: Mellanox Technologies MT27640 Family
Versions: Current Available
FW 16.26.1040 N/A
PXE 3.5.0803 N/A
Do you run ibv_post_send and ibv_poll_cq serially or in differents threads ?
Did you check if you didn’t receive IBV_EVENT_CQ_ERR that it causes by a CQ overrun, in case your CQ cannot be used anymore .
Thank you very much for your response。
ibv_post_send and ibv_poll_cq in differents threads and not received IBV_EVENT_CQ_ERR or any other err events from CM and any async events from ibv_get_async_event。
Marc, very thanks.
Can you provide a reference process for handling disconnect? To avoid mem-leak send post, my process is like this:
–>wait CM DISCONNEC_ EVENT
–>send last_ Post identifies the last send_ post
–>poll_cq threads get last_ post
— >safely destroy qp/cm_ id
I’m using SRQ, but I don’t deal with recv CQE at present. Any suggests?