Poor performance RDMA

hello, I’m opening a ticket to try to improve RDMA performance on our new Hyper-v Cluster.
we have 4 Hyper-V Windows2019DTC nodes, HPE DL360Gen10, with 2 Connectx-5 100GB / s cards, configured in Team-Set connected to two SN2100M switches. on it we have configured all the windows side PFC policies as per the WINof 2.80 manual. also on the switch we have enabled the roce and PFC with priority 3 and enabled on the ports.
running tests with diskpd the maximum speed in RDMA is about 2426 MB / S. I guess the speed must be considerably higher.
I have checked and double checked several times but I can’t find a solution.

Hello,

Have you used our RDMA performance tool that comes embedded with our driver? (nd_write/read/send_bw)
We do not test performance with the diskpd utility.

Did you test b2b omitting the switch(s)?

Are you monitoring the RDMA-traffic via perfmon?

Did you validate recommended BIOS Settings for maximum performance have been set on both Windows servers?

Have you consulted our WinOF2 UM section “Tunable Performance Parameters”? (IE: MTU, rcv/send buffers,RSS, etc)

Note: Make sure MTU is matching on switch(s)/ports.
RSS configuration very important to distribute traffic across cores belonging to the NUMA closest to the HCA.

Did you test omitting the Team-set? (IE: single HCA from both baremetal servers)?

Was the FW aligned when version 2.80 was installed?

Is or are the HCA’s installed on PCIe 3/4 x8/x16? 3x16 or 4x16 will provide better performance.

Are you seeing drop counters from the switch(s) and/or servers?

Sophie.

Hello Sophie, thanks for the reply. I’ll answer you point by point with the tests I will perform.
yes I have run the tests with specific tools of the WINof 2.80 drivers.
this is a test with send_bw. the others are analogues to this result.
#qp #bytes #iterations MR [Mmps] Gb / s CPU Util.
0 65536 100000 0.186 97.73 100.00

yes I’m monitoring the results also through windows perfmon …
in the bios settings I set the “Virtualization - Max Performance” workload. I think it is the most recommended since they are hyper-v hosts. By default this profile configures down all parameters in the WINof driver specification as suggested by mellanox.

I looked at the performance tuning part. I don’t know whether to increase the ReciveBuffer and SendBuffer entries. or rather, I tried but found no benefit. as for the registry keys, it is not well documented, and I don’t understand if they also refer to Win2019.

MTU is correct and is set to 9000 across the whole network, ports, switches.
and if it is disabled you will notice a lot the difference.

Rss, I checked and everything seems ok. do you have any particular advice to check it?

yes I also tried to bypass the team, using single ports.
but the results are identical.

yes the firmware of the cards are updated to the latest version and match the driver 2.80. (also verified with HPE support).

the cards are connected under PCI x16.
Bus Type: PCI-E 8.0 GT / s x16

I don’t see anything dropped from the switch counter, nor from the windows performance.

since you don’t find much in the forums or on the web, I’d like to know how much such a configuration should perform.
therefore I cannot understand if the infrastructure is really going to the maximum or not.

Hello,

You are getting 97.73Gbs so you are reaching rate line with our tool which imply our HCA is working as expected.
ROCE only supports active-standby so the bandwidth measurement is from 100Gbs only.

Have you consulted with the vendor that blessed, support and tested such deployment?

Sophie.

yes of course, all the hardware configuration has been certified by our reseller during the purchase, and also certified by HP.

the problem is that I can’t get a comparison with another configuration. I could even do a thousand tests, but without a comparison I wouldn’t know.
for example: the mellanox tools have calculated that the network travels at 97.73 gbs.

a trivial test: the copy of 1 large file, such as a 50gb VHDX from the c: \ of one server to the c: \ of another server, the copy is about 800mbs …
possible? is this a wrong measurement? or am I doing something wrong?
server storage is on NVME (mixed use).

I would recommend in parallel to engage the certified vendor(s) for QA inquiries, how do they test and what is the overall bandwidth expected. They should have some benchmarking numbers and applicable tool(s) they certified for this deployment. From our HCA perspective, the rate line is reached.

Sophie.