Degraded performance with drivers version 5 (vs version 4)

Hello,

We have a few Linux (CentOS 7) machines interconnected with ConnectX5 NICs (MCX515A-CCAT) on a 100GbE Ethernet network (Juniper switches) with support for RoCE. We have been using drivers version 4 for a while with no performance issues: ib_send_bw provides a very steady 97 Gb/s (MTU is 9000).

The drivers are installed with kernel support from MLNX_OFED packages. The OS is up-to-date: CentOS 7.9 to this day, with Linux kernel version 3.10.0-1160.6.1.el7.x86_64

We have tried each new release of the drivers version 5 with degraded performance:

ib_send_bw -d mlx5_0 -F --report_gbits --run_infinitely -D 1 reports throughput oscillating between 50 and 70 Gb/s.

Has anyone experienced such degradation?

Has anyone any hint for a possible reason?

Or a good way to investigate what could be the problem?

Any help would be appreciated.

Cheers,

Fabrice

Hi Fabrice,

Please note that we are not familiar with such performance degradation in the new OFED version .

According to our records your account has valid support contract therefore we suggest to open support ticket at Networking-support@nvidia.com in order to investigate this issue.

Thanks,

Samer

Hi Samer,

Thank you, I will.