Question about RC (read) and UD (send/recv) performance over CX3 and CX4 NIC

I am using CX3 and CX4 NICs to measure the throughput of RDMA verbs (RC Read and UD Send/Recv).

When I use the same test code to measure the peak throughput of small messages on CX3 and CX4.

The performance of RDMA Read Verbs is lower than Send/Recv Verbs on CX3, while the comparison result is reversed on CX4.

How about the performance trend of new generation of NICs like CX5 or CX6?