When I am using my connect-ib card for rdma communications, I had some seg fault with my application. I was able to get some trace information using fwtrace as follows:
I guess the trace is from mlxtrace tool. It is pretty hard to guess what went wrong since we don’t have your piece of code and on call you are getting a segfault. My guess is that these error might related to some issue in MLX5_EVENT_TYPE_CQ_ERROR flow. I also guess dmesg will print something else that might be more related.
I found a cool feature on-demand paging in the user manual. This features seems to be a good fit for my application. However, while I am evaluating the performance of on-demand paging, I see there is a unexpected behavior. For example, if I just issue rdma operation from same buffers in thousands of iterations. I see there are some page faults reported by the hca even after I prefetch these buffers in the beginning. That’s why I was looking at the trace information to see if my card is ok or not. Do you have any idea why this is happening?