MCX556A-EDAT: Direct Connection via Ethernet unable to reach more than 73GBit/s

thomas.rosenstein · July 8, 2021, 10:16am

Hello,

I’m trying to reach 100G over 2 directly connected MCX556A cards. I am using OFED 5.4.1 on Linux Centos 7.9 with stock kernel (3.10.0-1160).

I have executed mlnx_tune and set additional parameters:

sysctl net.core.rmem_max=2147483647

sysctl net.core.wmem_max=2147483647

sysctl net.ipv4.tcp_rmem=“4096 87380 2147483647”

sysctl net.ipv4.tcp_wmem=“4096 65536 2147483647”

sysctl net.core.netdev_max_backlog=250000

don’t cache ssthresh from previous connection

sysctl net.ipv4.tcp_no_metrics_save=1

Explicitly set htcp as the congestion control: cubic buggy in older 2.6 kernels

sysctl net.ipv4.tcp_congestion_control=htcp

If you are using Jumbo Frames, also set this

sysctl net.ipv4.tcp_mtu_probing=1

recommended for CentOS7/Debian8 hosts

sysctl net.core.default_qdisc=fq

The hosts are 2x AMD EPYC 7542 with 1 TB Memory, htop and top show utiliziation during tests of 1-2%. The CPU is configured for 4 NUMA nodes, and the adapter is bound to the corresponding one. The Adapter is connected via PCIex4 x16. RPS and XPS cpus are pinned.

The eth interfaces are set to mtu 9000.

I’m testing with iperf, iperf3 and raw_ethernet_bw. The maximum I was able to achieve was 73 Gbit/s. iperf and iperf3 are run as separate processes, I have tried from 2 to 8 processes and everytime the same result. I do not see some retries from iperf3, but they are around 200 - 500 for a 30 second test.

I did the same test with a switch in between (Dell S5232F-ON) there I had much high retries, around 50k.

I have tested by reducing the link to 50G and 25G and both times I can reach the maximum speeds (46.3 Gbit and 23.2 Gbits) - so I would expect 4x 23.2 Gbits, so around 92.8 Gbits.

Locally (lo interface) I can easily reach 190 Gbits Send/Receive.

I have followed the tuning guidelines:

https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

https://community.mellanox.com/s/article/how-to-tune-an-amd-server--eypc-cpu--for-maximum-performance

I will test still a different cable, but mlxlink doesn’t report any issues.

What else can be checked? How can I find out WHAT is limiting the performance here?

How can I test a loopback configuration?

namrata1 · July 12, 2021, 8:27pm

Hi Rosenstein,

Thank you for posting your question on our community.

As you mentioned these are AMD CPU based hosts, can you please confirm the below two requirements are met as these help improve performance on AMD based CPU’s:

a. GRUB command line used have “iommu=pt” . Please share output of #cat /proc/cmdline

b. Are all DIMM’s populated?

In addition, as you are using OFED 5.4, I believe you have the latest firmware installed unless you installed the driver using "–without-fw-update " flag.

In case you have the above parameters in place and still see reduced performance, we will open a support ticket as I see your account holds a valid support contract.

Thanks,

Namrata.

Thanks,

Namrata.

thomas.rosenstein · October 6, 2021, 5:48pm

Edit: it seems to actually have worked, I can now reach 92.4 GBit/s via ethernet, same as via RDMA

@Namrata Motihar

Hi, very sorry I have not responded earlier, I actually did not see your post!

I have added iommu=pt, but it did not change anything - we are not using SRVIO - plain bare metal hardware

cmdline:

BOOT_IMAGE=/vmlinuz-5.10.37 root=/dev/mapper/cl-root ro crashkernel=896M rd.lvm.lv=cl/root net.ifnames=0 biosdevname=0 scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=y mitigations=off console=tty0 console=ttyS1,115200 iommu=pt

Please disregard the 5.10.37 kernel here, I have rebooted into the up2date kernel, cmdline is the same

b) 8 Dimms are populated per CPU:

description: DIMM DDR4 Synchronous Registered (Buffered) 3200 MHz (0.3 ns)

product: HMAA8GR7MJR4N-XN

vendor: Hynix Semiconductor (Hyundai Electronics)

physical id: 17

serial: 933237DF

slot: B8

size: 64GiB

width: 64 bits

clock: 3200MHz (0.3ns)

I can reach 99 Gbit/s via Infiniband (ib0) and 91 Gbit/s via Ethernet (eth2) when using ib_read_bw / ib_write_bw

Using iperf3 or iperf I max out around 60 - 70 Gbits (mtu 9000)

We do have an active support contract, currently until next week.

Topic		Replies	Views
Cannot get 40Gbps on Ethernet mode with ConnectX-3 VPI Ethernet Adapter Cards	3	701	November 17, 2014
Cannot achieve 100Gbps with MCX416A-CCAT? Ethernet Adapter Cards	0	397	September 3, 2015
Mellanox ConnectX-4 VPI in 100GbE ethernet mode cannot perform beyond ~52Gbps lspci	1	1289	March 14, 2017
100Gbit slow on Epyc HW. Ethernet Adapter Cards	1	397	November 20, 2018
MCX354A won't do 56Gbps Adapters and Cables	1	329	February 9, 2021
We have a few servers with MCX623106AS-CDAT Ethernet 100Gb 2-port QSFP56 cards. Is there any published performance baseline for these cards? What am I supposed to see if I run a raw_ethernet_bw test between 2 of these? Software And Drivers performance , ethernet , iterations , bytes , tx	3	956	September 14, 2021
[MCX515A-CCAT / MCX516A-CCAT] Can only generate 53Gb/s with 64B packets Adapters and Cables	2	490	March 14, 2020
Mellanox ConnectX-5 adapter speed 650MB/s only Ethernet Adapter Cards	2	531	October 26, 2017
100G Speed-tests VMWare Ethernet Switches	8	1195	August 9, 2017
IPoIB performance issue!! Mellanox OFED	4	667	April 26, 2014

MCX556A-EDAT: Direct Connection via Ethernet unable to reach more than 73GBit/s

don’t cache ssthresh from previous connection

Explicitly set htcp as the congestion control: cubic buggy in older 2.6 kernels

If you are using Jumbo Frames, also set this

recommended for CentOS7/Debian8 hosts

Related topics