I’m working on Windows RDMA(Network Direct) with WinOF-2 driver installed. I found the write and send API is
extremely slow in multi-threads scenario using one QP, it seems there may be locks inside these APIs implementation.
latency of APIs call are roughly as belowing:
300 ns if using 1 thread
600~700ns using 2 threads
5000~5700ns using 5 threads
10~15us (microseconds) using 10 threads
Why is this ?
Can I got source codes of WinOF-2 driver somewhere ?
Thank you for posting your query on our community.
Regarding your concern for poor throughput, single QP does not provide full line rate. Our design is built for running multiple QPs in parallel thus compensating the single QP rate limit. It is a design limitation, and we don’t have any tuning for it. So, to achieve the full throughput, we would recommend testing with two QPs.
Regarding your question about WinOF-2 source code, we would like to inform you that it is not publicly available. Our engineering team cannot provide it unless there is a special justification for such a request.
If you require further assistance on this, I would suggest you to open a support case for further investigation of the issue. The support ticket can be opened by emailing "Networking-support@nvidia.com "
Please note that an active support contract would be required for the same. If you do not have a current support contract, please reach out to our Contracts team at networking-contracts@nvidia.com
Hi, does this reply means I should not use one QP in multi-threads, for Sending and Writing ?
I may not need to full line rate or achieve the full throughput, one QP is enough for my upper layer APPs, but must with multi-threads supported.
What I care is: are there some methods to optimize latency of Send or Write API in multi-threads environment, the latency of one single call seems to linearly grows along with the number of threads, which really upset me.