While using rperf and mlxndPerf I am able to confirm 98 Gb/s read and write, but actual file transfers and read/write performance and iperf tests are only showing up to 50 Gb/s write and 10-14 Gb/s read. I have been working at this for over a month and have done consultations with IT tech, multiple tech support staff from different places and have spent many many many hours working on this and cannot figure this out.
I have two workstations Im connecting to a NAS (TS-H1290FX) that has six 7.68TB seagate NVME drives in it configured with RAID 0 (each drive individually capable of 6700 MB/s read AND write). The workstations are NAS are all equipped with the same 100G NIC (QXG-100G2SF-CX6) which uses a Mellanox Dx6 controller. I am connecting all the NICs with Mellanox transceivers and MTP OM5 fiber optic cables from FS.com. I have installed Windows 11 Pro for Workstations on both PCs (to enable SMB 3.1.1, etc) I have confirmed in system BIOS and in Windows that the NICs are indeed running at 16x on the PCIe Gen 4 bus on both systems. Both systems are running AMD threadrippers, and the NAS has an AMD Epyc CPU. I have 128GB of RAM in one machine and 256GB RAM in the other. All HDs in these PCs are M.2 NVME drives capable of 3000MB/s in one machine and 6000 MB/s in the other (running newer Sabrent NVME drives).
Things I have tried to increase speeds on both PCs, all to no avail:
- set jumbo frames to 9000
- Enabled RoCEv2
- Tried different DAC cables.
- Tried connecting PCs to eachother (resulted in slower speeds, around 10-12 Gb/s read and write)
- Installed updated Mellanox WinOF-2 client drivers.
- Tried setting Interrupt Moderation to low latency.
- Tried disabling RoCE.
- Tried a Mellanox brand NIC instead of the QNAP card. Same results.
- Tried installing and running Windows Server 2022 on one of the workstations. Same speeds to NAS. No change.
- Tried different block sizes on the NAS, etc. Given two workstations connected to eachother cannot exceed 10-14 Gb/s I dont think the bottleneck is the NAS.
- Tried a loopback test with iPerf. Wasn’t able to achieve faster than 10-14 Gb/s.
- Verified SMB direct is enabled.
- Verified Windows is using SMB 3.1.1
- Tried probably another ten things I cannot even remember.
There seems to me to be no logical reason why either the hardware or software would be causing a bottleneck. I’ve tried everything! Been working on this for a month now. I have $14,000 in NAS storage I need to use but cannot because speeds are so bad. What am I missing?