How to monitor one-sided RDMA bandwidth at QP level?

I want to monitor one-sided RDMA bandwidth at receiver side (whose CPU is bypassed) at QP-level.
I noticed that mlx5 based RNICs provide port counters and hardware counters ( Understanding mlx5 Linux Counters and Status Parameters (nvidia.com)). But port_rcv_data only provides port-level size of data transferred, and rx_read/write_requests only provide QP-level number of requests received.
My question is that is there any method to monitor one-sided RDMA bandwidth at receiver side (whose CPU is bypassed) at QP-level?