How to debug RDMA read work request not finish error?

Hello, these days I detected an hang error of my program and after some days debug, I find that the problem is that I use rdma_post_read to post a rdma read work request but this request don’t appear in completion queue, this lead to many step can not run, so lead to my program hang all time time, could you tell me how to debug this error?

Just now I do clean all rdma NIC counter, and re-run my program, and below is all counter’s value:

Content of duplicate_request is:

0

Content of implied_nak_seq_err is:

0

Content of lifespan is:

10

Content of local_ack_timeout_err is:

0

Content of np_cnp_sent is:

5

Content of np_ecn_marked_roce_packets is:

5

Content of out_of_buffer is:

315

Content of out_of_sequence is:

0

Content of packet_seq_err is:

0

Content of req_cqe_error is:

0

Content of req_cqe_flush_error is:

0

Content of req_remote_access_errors is:

0

Content of req_remote_invalid_request is:

0

Content of resp_cqe_error is:

0

Content of resp_cqe_flush_error is:

0

Content of resp_local_length_error is:

0

Content of resp_remote_access_errors is:

0

Content of rnr_nak_retry_err is:

0

Content of rp_cnp_handled is:

0

Content of rp_cnp_ignored is:

0

Content of rx_atomic_requests is:

0

Content of rx_dct_connect is:

0

Content of rx_icrc_encapsulated is:

0

Content of rx_read_requests is:

3

Content of rx_write_requests is:

0

I find that the ‘out_of_buffer’ counter and is not zero.

How to debug this problem?

Hi, ‘out_of_buffer’ means the application cannot deal with the packets. When you said rdma_post_read to post a rdma read work request but this request don’t appear in completion queue, you may have something error in configuration. Could you share the demo code for this issue?