Running performance benchmarks against an NVME-RoCE array and notice that while throughput increases linearly as I increase NIC port usage for large IO sizes (128KB), performance decreases when leveraging the 2nd port of a dual port CX6 or 7 device. In the same host, if I simply instead leverage a single port of another NIC, I achieve nearly double the throughput.
Linux host has (2) dual port CX6 (have also experienced this on CX7) NICs, connected via 100G to Arista switch, which is subsequently connected to 16 x 100G ports on the array.
You can see linear increase in throughput as ports increase, regardless of port location for large IO (as expected)
IO-size . Ports . NICs . Throughput (MiB/s)
128KB . 1 . . . . . 1 . . . . 11,669
128KB . 2 . . . . . 1 . . . . 23,335
128KB . 2 . . . . . 2 . . . . 23,335
128KB . 4 . . . . . 2 . . . . 46,674
However, for 4K, it is detrimental to use both ports
IO-size . Ports . NICs . Throughput (MiB/s)
4KB . . . .1 . . . . . 1 . . . . 11,226
4KB . . . .2 . . . . . 1 . . . . 10,996
4KB . . . .2 . . . . . 2 . . . . 22,036
4KB . . . .4 . . . . . 2 . . . . 20,170
Given this, if the workload is mainly small IO’s, the usefulness of the secondary port is diminished.
Has anyone else experienced this, and if so, any tuning that can overcome it?