We are testing the performance of RDMA using ConnectX-5 25G on windows 10 Enterprise.
we test the NetworkDirectSPI write operation latency with data length of 1024Byte.
ref: GitHub - microsoft/NetworkDirect: NetworkDirect Service Provider Interface
And both server and client are on the same PC.
the average latency is around 50 microseconds, but sometimes the latency get more than 100ms.
My question is:
Why these high latency data happened, is there any settings of configs to avoid this.
Durning the test we found that the PCIE version changes when running âmlx5cmd -statâ after the PC restarted. the PCIE hardware version is 2, but sometimes the command shows PCIE gen1, and the speed also affected. why the this happens and how to avoid?
Welcome, and thank you for posting your inquiry to the NVIDIA community!
Given the fact that the PCIe link speed varies across reboots, the integrity of the PCIe link is called into question. When the PCIe link is unable to train at the optimal link speed/width, 3 scenarios are most likely:
a) The adapter is not seated properly.
b) Thereâs a hardware issue with the adapter.
c) Thereâs a hardware issue with the slot (mainboard).
As the integrity of the PCIe link itself is unknown, this needs to be rectified before performance tuning / troubleshooting can be performed.
If reseating the adapter does not resolve the sporadic link speed degradation, a swap to another slot is recommended.
If the same behavior is encountered in another slot, swap with a known good adapter is recommended.
If the same behavior is encountered on this system with a known good adapter and/or in a different slot, then we recommend engaging your hardware vendor to assess next steps with regards to the mainboard hardware.
Once the PCIe link is validated, we have several tuning recommendations in the âTroubleshootingâ section of the WinOF-2 User Manual >> https://docs.nvidia.com/networking/display/winof2v320/Troubleshooting . Relevant sections here would be âEthernet Related Troubleshootingâ and âPerformance Related Troubleshootingâ.
If you are unable to achieve stable performance after these steps have been followed, and you have valid support entitlement, we recommend opening a support ticket with our Enterprise Support team via the NVIDIA Enterprise Experience Support Portal: https://enterprise-support.nvidia.com/s/create-case . Our engineers will be able to assist you with determining the root cause of this degradation.
Thanks, and best regards,
NVIDIA Enterprise Experience