100GB CX4 + MSN2100 Switch -> slow speed (2-9Gb/s) (solved)

Hello,

I’m new to the whole Mellanox stuff and was able to get the card and the switch working, as I mean “network just works” :-)

We have per node:

  • Dual port MT27700 Family [ConnectX-4] (16x PCIx Gen3)
  • MSN2100 switch
  • Connected with QSFP28 (100GB DAC cables)
  • 2 x E5-2620 v4 @ 2.10GHz
  • 2x 16GB Ram DDR4
  • Supermicro X10DRi
  • Debian Jessie (Proxmox 4.x with 4.4.59-1-pve)
  • Latest firmware for the CX4 and the switch
  • Module version 4.0-2.0.0
  • Installed packages: mlnx-en-dkms / mlnx-en-eth-only / mlnx-en-utils
  • Network basic configured with MTU 9000 (NIC + switch ports)

I tested the plain speed with iperf2.x and iperf3.x

  • iperf2
    [ 4] 0.0-10.2 sec 16.1 GBytes 13.6 Gbits/sec [ 11] 0.0-10.2 sec 7.68 GBytes 6.49 Gbits/sec [ 10] 0.0-12.9 sec 3.50 MBytes 2.28 Mbits/sec [ 6] 0.0-14.3 sec 7.25 MBytes 4.25 Mbits/sec [ 9] 0.0-15.4 sec 8.49 GBytes 4.75 Gbits/sec [ 8] 0.0-19.2 sec 8.00 MBytes 3.49 Mbits/sec [ 5] 0.0-26.0 sec 4.12 MBytes 1.33 Mbits/sec [ 7] 0.0-26.0 sec 3.12 MBytes 1.01 Mbits/sec [SUM] 0.0-26.0 sec 32.3 GBytes 10.7 Gbits/sec
  • iperf3
    [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 5.28 GBytes 4.53 Gbits/sec 1213 sender [ 4] 0.00-10.00 sec 5.26 GBytes 4.52 Gbits/sec receiver [ 6] 0.00-10.00 sec 5.47 GBytes 4.70 Gbits/sec 1029 sender [ 6] 0.00-10.00 sec 5.46 GBytes 4.69 Gbits/sec receiver [ 8] 0.00-10.00 sec 4.70 GBytes 4.04 Gbits/sec 1064 sender [ 8] 0.00-10.00 sec 4.70 GBytes 4.04 Gbits/sec receiver [ 10] 0.00-10.00 sec 5.64 GBytes 4.85 Gbits/sec 927 sender [ 10] 0.00-10.00 sec 5.62 GBytes 4.83 Gbits/sec receiver [ 12] 0.00-10.00 sec 3.60 GBytes 3.10 Gbits/sec 716 sender [ 12] 0.00-10.00 sec 3.59 GBytes 3.08 Gbits/sec receiver [ 14] 0.00-10.00 sec 4.80 GBytes 4.12 Gbits/sec 1240 sender [ 14] 0.00-10.00 sec 4.78 GBytes 4.11 Gbits/sec receiver [ 16] 0.00-10.00 sec 5.26 GBytes 4.52 Gbits/sec 1154 sender [ 16] 0.00-10.00 sec 5.26 GBytes 4.52 Gbits/sec receiver [ 18] 0.00-10.00 sec 5.98 GBytes 5.14 Gbits/sec 969 sender [ 18] 0.00-10.00 sec 5.98 GBytes 5.14 Gbits/sec receiver [SUM] 0.00-10.00 sec 40.7 GBytes 35.0 Gbits/sec 8312 sender [SUM] 0.00-10.00 sec 40.7 GBytes 34.9 Gbits/sec receiver

iperf was started with iperf -c -P8.

So it is much slower, than the Mellanox examples, which reaches over 11Gb/s. If I try setting the sysctl examples, than the speed goes mostly down. So I’m searching the handbrake …

~# ethtool eth4 Settings for eth4: Supported ports: [ FIBRE Backplane ] Supported link modes: 1000baseKX/Full 10000baseKR/Full 40000baseKR4/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Advertised link modes: 1000baseKX/Full 10000baseKR/Full 40000baseKR4/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: Yes Speed: 100000Mb/s Duplex: Full Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000004 (4) link Link detected: yes

What is a bit strange, I used only the DEB packages and didn’t find the mlxconfig, but the mstconfig. With that tool, I switched to ethernet protocol:

mstconfig -y -d 02:00.0 set LINK_TYPE_P1=2

It would be nice, if someone can help, to get over 10Gb/s :-)

cu denny

Hello,

I solved the problem with using two DIMMS per CPU socket. The throughput jumps from ~6GB up to 13GB (~11 Gb/s). The next one I did, was to change the PCI slots a bit, so that the CX4 goes to CPU2 and some other cards now handled by CPU1. In the first test cases (all iperf2) the throughput jumps between ~9 and 14GB, after changing the PCI slots, the values more constantly are between 11 and 13GB (~10-11 Gb/s).

With the settings:

MLXNET tuning parameters

net.core.rmem_max = 2147483647

net.core.wmem_max = 2147483647

net.ipv4.tcp_rmem = 4096 87380 2147483647

net.ipv4.tcp_wmem = 4096 87380 2147483647

END MLXNET

I get:

[ 10] 0.0-10.0 sec 10.6 GBytes 9.11 Gbits/sec

[ 4] 0.0-10.0 sec 12.1 GBytes 10.4 Gbits/sec

[ 5] 0.0-10.0 sec 12.5 GBytes 10.8 Gbits/sec

[ 6] 0.0-10.0 sec 13.2 GBytes 11.4 Gbits/sec

[ 3] 0.0-10.0 sec 15.0 GBytes 12.9 Gbits/sec

[ 7] 0.0-10.0 sec 15.0 GBytes 12.9 Gbits/sec

[ 8] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec

[ 9] 0.0-10.0 sec 12.9 GBytes 11.1 Gbits/sec

[SUM] 0.0-10.0 sec 103 GBytes 88.8 Gbits/sec

cu denny