Experiencing low performance on Mellanox ConnectX-6DX

mohmmad.raouf11 · March 31, 2024, 12:04pm

Following the optimization of the OpenShift cluster, I used ProX version 22.11 for performance evaluation, I’ve found that I’m unable to utilize more than 6GB of bandwidth. I have tested with a 64-byte frame size and achieved a maximum of 6.99 MPPS.

I’ve attempted to address this issue by adhering to the recommendations outlined in the DPDK 22.03 NVIDIA Mellanox NIC performance report available at https://fast.dpdk.org/doc/perf/DPDK_22_03_NVIDIA_Mellanox_NIC_performance_report.pdf. However, the problem persists.

Additionally, I’ve investigated packet loss at the NIC interface level and found no anomalies. The bottleneck appears to be related to packet generation, but I’m uncertain about the underlying cause.

I’m seeking advice or references on potential solutions. Should I consider updating the firmware or driver? Any insights or recommendations would be greatly appreciated.

Below are the SUT details:

Nic Model: Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

uname -r
5.14.0-284.54.1.rt14.339.el9_2.x86_64

ethtool -i enp216s0f0np0
driver: mlx5_core
version: 5.14.0-284.54.1.rt14.339.el9_2.
firmware-version: 22.35.2000 (MT_0000000359)
expansion-rom-version:
bus-info: 0000:d8:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

## CPU
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  104
  On-line CPU(s) list:   0-103
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
    BIOS Model name:     Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz

Operating System:

cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="415.92.202402201450-0"
VERSION_ID="4.15"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 415.92.202402201450-0 (Plow)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.15/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.15"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.15"
OPENSHIFT_VERSION="4.15"
RHEL_VERSION="9.2"
OSTREE_VERSION="415.92.202402201450-0"

OCP Cluster

oc version
Client Version: 4.15.0-202402070507.p0.g48dcf59.assembly.stream-48dcf59
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

jtal · April 1, 2024, 5:36am

Hello,

Thank you for approaching us. I have reviewed your queries - please see my recommendations below.

First, even though the ConnectX-6 Dx Firmware version is an LTS release, I would recommend upgrading it to the latest stable - 22.35.3502-LTS.
You can download the version here: Firmware for ConnectX®-6 Dx | NVIDIA

Additionally, since you are referring to the DPDK 22.03 performance report, I assume this is the DPDK version you are using.
This version is rather old (released March 17, 2022).
I recommend upgrading DPDK as well, to the latest LTS version, 23.11, which can be found here: https://core.dpdk.org/download/

Once the versions are aligned to the latest, please try to evaluate the performance again.
If the issue still persists or any other issues arise, please open a case at: enterprisesupport@nvidia.com, and it will be handled according to entitlement.

Best Regards,
Jonathan.

mohmmad.raouf11 · April 1, 2024, 7:20am

Thank you for reaching out.

While upgrading the firmware to the latest stable release sounds like a viable option, I’d like to understand the rationale behind moving to the new version. Are there any reported performance issues with the current firmware/driver/kernel version I’m using? If not, is it possible to achieve optimal performance with the existing setup? If you require further information to assess this, please don’t hesitate to let me know.

Regarding the DPDK version, I’ve noticed similar tuning parameters being used across different versions, albeit with variations in the parameters passed. Based on my understanding, I believe my current version should also deliver good performance.

Please advise on the next steps to proceed.22_11 similar tunings

Topic		Replies	Views
Performance Test finding bottleneck and optimization Network Management Products dpdk , mellanox-ofed	2	1749	March 17, 2022
How to find the maximum number of RX Queues for a NIC (ConnectX-5)? Software And Drivers	4	4034	April 7, 2021
Getting slow speeds on Connectx-4 LX Ethernet Adapter Cards	1	1525	July 19, 2023
Mellanox ConnectX-4 VPI in 100GbE ethernet mode cannot perform beyond ~52Gbps lspci	1	1138	March 14, 2017
How to flash a generic firmware to ConnectX6-DX board? Mellanox OFED	8	1395	December 16, 2023
Packet loss with multi-frame payloads Application Accelerator Software	1	685	August 10, 2017
ConnectX-6 Dx NIC Performance Issue - rx_prio0_buf_discard Metric Increase Ethernet Adapter Cards performance , dpdk	11	2504	December 19, 2024
[MCX515A-CCAT / MCX516A-CCAT] Can only generate 53Gb/s with 64B packets Adapters and Cables	2	390	March 14, 2020
ConnectX6 DPDK dpdk-testpmd Receive tcp ,udp Mixed flow performance is very low! Software And Drivers	2	895	January 31, 2022
What is the correct driver for ConnectX-4 LX and ConnectX-6 LX Cards? Mellanox OFED	4	1029	September 18, 2023

Experiencing low performance on Mellanox ConnectX-6DX

Related topics