We have a few servers with MCX623106AS-CDAT Ethernet 100Gb 2-port QSFP56 cards. Is there any published performance baseline for these cards? What am I supposed to see if I run a raw_ethernet_bw test between 2 of these?

paolo.prinsecchi · September 1, 2021, 6:28am

This is what I see, testing on 2 new HP dl385G10Plus v servers, with latest generation Epyc CPUs. One Arista 7800 switch between them.

Ubuntu 20.04.1 , kernel 5.4.0-81-generic

server: raw_ethernet_bw --server -d mlx5_0 -B 88:e9:a4:33:48:b1 -F --duration 20

client: raw_ethernet_bw --client -d mlx5_0 -B 88:e9:a4:20:20:d3 -E 88:e9:a4:33:48:b1 -F --duration 20

results:

Max msg size in RawEth is MTU 1518

Changing msg size to this MTU

Send BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RawEth Using SRQ : OFF

PCIe relax order: ON

ibv_wr* API : OFF

TX depth : 128

CQ Moderation : 1

Mtu : 1518[B]

Link type : Ethernet

GID index : 0

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

raw ethernet header**************************************

| Dest MAC | Src MAC | Packet Type |

|------------------------------------------------------------|

| 88:E9:A4:33:48:B1| 88:E9:A4:20:20:D3|DEFAULT |

|------------------------------------------------------------|

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

1518 42973376 0.00 6221.15 4.297334

Is this result as expected?

paolo.prinsecchi · September 3, 2021, 5:02am

any suggestion?

A typical Iperf (iperf -c 192.168.1.1 -w 2m -P 32) between 2 nodes shows bandwidth fluctuating between 40 and 60 Gbps, way below what we would expect.

Thanks,

samerka · September 13, 2021, 2:35pm

Hi Paolo,

Yes we suggest reviewing the below performance guides :

https://community.mellanox.com/s/article/getting-started-with-performance-tuning-of-mellanox-adapters

and

https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

Once you tune the system run the tests again , if the results are not as expected we suggest opening a new support ticket for further investigation by sending email to Networking-support@nvidia.com

Thanks,

Samer

paolo.prinsecchi · September 14, 2021, 1:30am

Thanks. We did, together with a long list of tests, as per yours and AMD’s tuning whitepapers, cross-checking them with RedHat and SUSE’s tuning whitepapers.

At the moment the only way in which we can get consistent good performances is by setting the IRQs as per your guide, pinning the card to the right numa node (using your configuration script) AND pinning the iperf server process to the same numa node where the card is pinned to. With that, I get a consistent 95 Gbps.

If the iperf (server) process is not pinned to anything, performances vary quite a bit, 50 Gbps in average. if I force it to numa node 7 (the card is on 2), it goes down to 35 Gbps.

Pinning the process I need to a specific set of CPUs is not a valid solution for the production environment. We need to be able to use all the cores we have.

Things tested:

assorted kernels (5.4.xx, 5.11.xx), various drivers (your latest, inbox driver), assorted NumaPerSocket settings, all sorts of OS network stack optimizations, all sort of power governor settings, BIOS parameters around, Hardware config changes, etc etc.

What really bugs me is that if I move one of these cards to an old spare server, I immediately get excellent performance. No tuning whatsoever.

I tried opening a support ticket, it was closed right away as I do not have a direct support contract with Mellanox. All hardware was bought trough HPE (I have a case open with them). We also have a case open with AMD.

I’m really just looking for ideas on how to triage further.

Thanks,

PP

Topic		Replies	Views
Infiniband performance tuning InfiniBand/VPI Adapter Cards iterations , bytes	2	1410	June 7, 2017
ib_write_bw does not go beyond 20G/s when using average packet size 512 Adapters and Cables iterations , bytes	3	1195	September 30, 2021
100G Speed-tests VMWare Ethernet Switches	8	981	August 9, 2017
Poor bandwidth performance when running with large block size Ethernet Adapter Cards iterations , bytes	9	1621	May 28, 2018
How push full 100gbps port? Ethernet Adapter Cards kernel , ubuntu	7	3760	May 3, 2024
ib_send_bw performance puzzle Mellanox OFED iterations , bytes	4	3367	April 27, 2016
Cannot get 40Gbps on Ethernet mode with ConnectX-3 VPI Ethernet Adapter Cards	3	586	November 17, 2014
Cannot achieve 100Gbps with MCX416A-CCAT? Ethernet Adapter Cards	0	352	September 3, 2015
Is this the best our FDR adapters can do? InfiniBand/VPI Adapter Cards iterations , bytes	3	665	August 18, 2016
Is this Normal? Mellanox ConnectX-DX 6 slow performance Ethernet Adapter Cards	4	949	May 9, 2024

We have a few servers with MCX623106AS-CDAT Ethernet 100Gb 2-port QSFP56 cards. Is there any published performance baseline for these cards? What am I supposed to see if I run a raw_ethernet_bw test between 2 of these?

Related topics