What settings for maximizing throughput for performance testing?

I have Connext X4 NICs and a Juniper Networks QFX-5100 switch. A couple of years ago, when I was first comparing this switch with another, I was seeing consistently, 39.5 Gbps between 2 hosts using iperf3. Today, I’m running the same test and occasionally, I’ll see something near 39 Gbps, but it’s usually much slower: more consistently around 32.

What has changed? The long story short is that I found, after implementing these NICs and switches in service, we were seeing the Buffer Bloat phenomenon. After reading a lot of network articles regarding this, I found a recipe for alleviating this. I’ve suspended those modifications for testing but still am not seeing consistently high TCP throughput.

Other factors that have been issues in the past include:

  • Improper location in the PCIe bus: the NIC was plugged into a socket that was only x4. I’ve verified this is not the case in these. Both hosts do have their NICs in x8 slots.
  • Outdated drivers. I’ve loaded 4.6 onto these systems (which I downloaded last week from Mellanox).
  • Poor quality QSFPs, but I’ve verified that these systems are using the prescribed Mellanox QSFPs.

The host OSes are Linux: CentOS 7.6. I’ve changed back the QDisc to mq/pfifo_fast for the NIC (for the duration of the performance testing). I’ve played around with other Congestion Control Algorithms too. To alleviate Buffer Bloat, we implement agilesd, and I’ve noticed that changing back to the default of cubic will usually result in a single run of iperf3 being much closer to the anticipated 39.5.

Would anyone have any suggestions for items I may have overlooked, or better values for the items I have?


Go through the Performance Tuning Guides here:


You will find all what you need regarding your question.

Good Luck !