Hello,
We have a few Linux (CentOS 7) machines interconnected with ConnectX5 NICs (MCX515A-CCAT) on a 100GbE Ethernet network (Juniper switches) with support for RoCE. We have been using drivers version 4 for a while with no performance issues: ib_send_bw provides a very steady 97 Gb/s (MTU is 9000).
The drivers are installed with kernel support from MLNX_OFED packages. The OS is up-to-date: CentOS 7.9 to this day, with Linux kernel version 3.10.0-1160.6.1.el7.x86_64
We have tried each new release of the drivers version 5 with degraded performance:
ib_send_bw -d mlx5_0 -F --report_gbits --run_infinitely -D 1 reports throughput oscillating between 50 and 70 Gb/s.
Has anyone experienced such degradation?
Has anyone any hint for a possible reason?
Or a good way to investigate what could be the problem?
Any help would be appreciated.
Cheers,
Fabrice