Hello, I’m using Mellanox ConnectX-2 40Gb/s adapters for filtering IP traffic on an HP 4xOpteron server. The server receives traffic via an HP Infiniband-only switch, so using mlx4_en is not an option.
I’m aiming to handle up to 10Gb/s of traffic. Currently the traffic hits a wall at ~6Gb/s as soon as I enable a netfilter rule that returns NF_ACCEPT immediately (in ‘connected’ mode). It goes down from there as soon as I do more processing (look at the payloads, distribute to workqueues, queue to usermode, etc). And ‘top’ shows only 1-2 CPUs being in use.
When just routing IP traffic (without filters), reaching 10Gb/s is not a problem (top shows 2-3% CPU load). I used the standard MLNX_OFED 1.5.3 (rhel-6.2-amd64).
To find out if there is additional bandwidth that Linux can’t handle, I did a quick test on win2008-r2 and the windows drivers (MLNX_VPI_WinOF-4.2 from the HP website) handled around 14Gb/s, and processing was distributed to at least 16 CPUs (according to taskmgr).
My questions are:
can the mlx4_ib driver take advantage of the hardware queues support (RSS) ? Apparently it’s what makes the difference on windows
if RSS is available, how can I enable it ? I’ve tried setting the interrupt affinity, disabling cpu scaling, enabling RSS, RPS, RFS, with no success so far (by following the steps from the performance tuning PDF from mellanox, and linux/Documentation/networking/scaling.txt)