How does ib_send_lat measure the latency?

k.bodi2 · October 13, 2023, 7:29am

When running ib_send_lat on our two rdma systems, we measure a standard deviation for latency (=jitter) of around 0.03 usec which is close to nothing.

I’m wondering, how is this even possible? Still, the CPU has to request the RDMA operation (here: send) after the last operations has been completed. The time between “Last operation is done” and “Request next operation” is fully based on the CPU and Scheduler, which means, here we lose time and consequently jitter should or can be higher.

So the next question is: How is the time measured in this tool? Does it start directly after “Request next operation” is done or after “Last operation is done” has been completed?

Kind regards,
k.bodi

k.bodi2 · October 23, 2023, 8:11am

Any update please?

namrata1 · October 25, 2023, 12:31am

Hi k.bodi2,

Thank you for posting your query on NVIDIA Community.

Based on my internal check, this will require an internal escalation to our Engineering Team. In order to submit an official escalation, a valid support contract will be required.

The basic details are available in README section on GitHub - linux-rdma/perftest: Infiniband Verbs Performance Tests which mentions

The latency benchmarks measure round-trip time but report half of that as one-way
latency. This means that the results may not be accurate for asymmetrical configurations.
- Latency tests report minimum, median and maximum latency results.
  The median latency is typically less sensitive to high latency variations,
  compared to average latency measurement.
  Typically, the first value measured is the maximum value, due to warmup effects.

Source code available at → https://github.com/linux-rdma/perftest/blob/master/src/send_lat.c

If there is an active contract in place, please feel free to open a support ticket by emailing enterprisesupport@nvidia.com

For details on contracts, please feel free to contact our contracts team at Networking-contracts@nvidia.com

Thanks,
Namrata.

k.bodi2 · October 25, 2023, 9:58am

Thank you for your reply, but the additions described by you won’t help me any further, because I already have found the codebase, but analyzing the code will require too much time.

Since we do not have any active contract, we cannot proceed on this.

Topic		Replies	Views
HDR Infiniband and ConnectX-6 VPI interfaces Software And Drivers infiniband , iterations , bytes	1	746	January 11, 2022
Questin regarding latency CUDA Programming and Performance	6	4246	August 26, 2010
Time stamp I2C communication and Latency Jetson Nano i2c	6	591	April 3, 2023
Unexplained CPU latency spikes Jetson AGX Xavier	4	908	October 16, 2019
Inconsistent hardware timestamping? ConnectX-5 EN & tcpdump Ethernet Adapter Cards	1	520	November 19, 2018
Kernel bypass(LD_preload) TCP/IP Socket send performance optimization Mellanox OFED	2	69	January 11, 2025
How to use CPU isolation to reduce network latency. Software And Drivers cores	10	1029	November 10, 2019
PCIe IRQ latency unbounded value Jetson AGX Xavier pcie	9	1137	October 18, 2021
Windows ConnectX-3 High Latency InfiniBand/VPI Adapter Cards	3	450	July 28, 2017
Warmup kernel and measure time CUDA Programming and Performance	10	1135	June 8, 2022

How does ib_send_lat measure the latency?

Related topics