My team is experiencing some performance issues with the rdma drivers included within the linux kernel.
What we are trying is to write a kernel module to utilize RDMA features.
The problem is the send-recv latency of kernel space between the two nodes. It has a significant performance drop on kernel space while user space programs do not experience such problem.
(We tried to get the send-recv latency of user space by executing ib benchmarks like ib_send_lat, and ib_send_bw. By executing these, we simply reach the maximum throughput of our hardware has)
Is there any way to improve our RDMA performance on our kernel module?