When running ib_send_lat on our two rdma systems, we measure a standard deviation for latency (=jitter) of around 0.03 usec which is close to nothing.
I’m wondering, how is this even possible? Still, the CPU has to request the RDMA operation (here: send) after the last operations has been completed. The time between “Last operation is done” and “Request next operation” is fully based on the CPU and Scheduler, which means, here we lose time and consequently jitter should or can be higher.
So the next question is: How is the time measured in this tool? Does it start directly after “Request next operation” is done or after “Last operation is done” has been completed?
Kind regards,
k.bodi
Hi k.bodi2,
Thank you for posting your query on NVIDIA Community.
Based on my internal check, this will require an internal escalation to our Engineering Team. In order to submit an official escalation, a valid support contract will be required.
The basic details are available in README section on GitHub - linux-rdma/perftest: Infiniband Verbs Performance Tests which mentions
Source code available at → https://github.com/linux-rdma/perftest/blob/master/src/send_lat.c
If there is an active contract in place, please feel free to open a support ticket by emailing enterprisesupport@nvidia.com
For details on contracts, please feel free to contact our contracts team at Networking-contracts@nvidia.com
Thanks,
Namrata.
Thank you for your reply, but the additions described by you won’t help me any further, because I already have found the codebase, but analyzing the code will require too much time.
Since we do not have any active contract, we cannot proceed on this.