Connect-IB driver trace info

When I am using my connect-ib card for rdma communications, I had some seg fault with my application. I was able to get some trace information using fwtrace as follows:

[503:12:30.862093402] I2 rxs_handle_tpt_syndrome: gvmi=0x0000, tag_type=0x1, context_number=0x01f039, rxw_opcode=0x02, psn=0x9189ac

[503:12:32.737484478] I2 starting RXS-tpt/inv error handler for slice: 0x8

[503:12:32.737527108] I2 rxs_handle_tpt_syndrome: gvmi=0x0000, tag_type=0x1, context_number=0x01f039, rxw_opcode=0x02, psn=0xd0b44c

[503:12:32.739025278] I4 handle_any_sx_error: ec_id=0x00, gvmi=0x0, req_res_=0, qpi=0x1f039, error_is_send=1

[503:12:32.739035670] I4 handle_any_sx_error: ec_id=0x00, gvmi=0x0, req_res_=0, qpi=0x1f039, synd=a956

Does this mean something is wrong with my card?

Hi,

I guess the trace is from mlxtrace tool. It is pretty hard to guess what went wrong since we don’t have your piece of code and on call you are getting a segfault. My guess is that these error might related to some issue in MLX5_EVENT_TYPE_CQ_ERROR flow. I also guess dmesg will print something else that might be more related.

I found a cool feature on-demand paging in the user manual. This features seems to be a good fit for my application. However, while I am evaluating the performance of on-demand paging, I see there is a unexpected behavior. For example, if I just issue rdma operation from same buffers in thousands of iterations. I see there are some page faults reported by the hca even after I prefetch these buffers in the beginning. That’s why I was looking at the trace information to see if my card is ok or not. Do you have any idea why this is happening?