How do I achieve full bandwidth from both 100Gb ports on the MCX516A-CDAT adapter

We have an MCX516A-CDAT adapter installed into a PCIe x16 GEN4 slot (AMD CPU/Windows 10 Pro), and each port is connected via a 3-meter QSFP28 compatible DAC to a 100Gb port on a Mellanox MSN2010 switch (ONYX). When our data sources are connected to the switch and running, we cannot seem to achieve the full 100Gb throughput from each port. We run data at 65.4Gb/sec from the switch through each 100Gb port. No matter how we configure the data sources (1 100Gb port or 2 100Gb ports), we can never exceed 100Gb throughput on the board. The specifications for this board clearly state that if installed in a PCIe x16 GEN4 slot, each port can run up to 100Gb. We are not trying to LAG ports at all. We have not been able to find anything in the documentation or device settings that allow us to maximize this board’s throughput. We are currently using 2 NICs to work around this problem, which is not optimal for our application.

Many factors could impact performance degradation or not reaching rate line.
Have you validated that the following components were deployed for maximum performance:

Server BIOS performance settings

OS performance settings (MTU, RSS,RX/TX buffer, etc…)

Latest WinOF2/FW (ref UM performance tuning section “https://docs.nvidia.com/networking/display/winof2v30

The best tool to measure TCP/UDP for Windows is the ntttcp tool.

For ROCE traffic, our driver embed our “MlxNdPerf Utility” for basic RDMA test, previously provided via “nd_write/read/send_bw” in older driver.

At last, once the server is optimized for maximum performance, should the performance issue remains, I would suggest opening a support case with Nvidia with a valid support contract.

Sophie.