ConnectX-4 LX 10GbE MCX4121A-XCAT copies very slowly between servers

ConnectX-4 LX 10GbE MCX4121A-XCAT card is definitely not getting the efficiency we want. iperf3 tests are very successful but copying between two servers is 150Megbyte/s almost at 1Gbe card speed. What could I be missing? I only get this problem on Dell PowerEdge R640 Redhat 7.9 server. Redhat 8.8 I usually get between 500 Megabyte and 850 Megabyte speed between servers. I need support on this issue.

hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: e8eb:d303:004d:cf2a
sys_image_guid: e8eb:d303:004d:cf2a
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110004
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet

hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: e8eb:d303:004d:cf2b
sys_image_guid: e8eb:d303:004d:cf2a
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110004
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet

Mellanox Technologies - System Report

Operation System Status
RHEL7.9
3.10.0-1160.99.1.el7.x86_64

CPU Status
GenuineIntel Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz Skylake
OK: Frequency 1687.518MHz >>> CPU frequency is below maximum. Install cpupowerutils and run x86_energy_perf_policy performance.

Memory Status
Total: 251.09 GB
Free: 240.76 GB

Hugepages Status
On NUMA 1:
Transparent enabled: always
Transparent defrag: always

Hyper Threading Status
ACTIVE

IRQ Balancer Status
INACTIVE

Firewall Status
ACTIVE
IP table Status
INACTIVE
IPv6 table Status
INACTIVE

Driver Status
OK: MLNX_OFED_LINUX-5.8-3.0.7.0 (OFED-5.8-3.0.7)

ConnectX-4LX Device Status on PCI 3b:00.0
FW version 14.32.1010
OK: PCI Width x8
OK: PCI Speed 8GT/s
PCI Max Payload Size 256
PCI Max Read Request 512
Local CPUs list [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46]

p1p1 (Port 1) Status
Link Type eth
Warning: Link status Down >>> Check your port configuration (Physical connection, SM, IP).
Speed N/A
MTU 1500
OK: TX nocache copy ‘off’

ConnectX-4LX Device Status on PCI 3b:00.1
FW version 14.32.1010
OK: PCI Width x8
OK: PCI Speed 8GT/s
PCI Max Payload Size 256
PCI Max Read Request 512
Local CPUs list [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46]

p1p2 (Port 1) Status
Link Type eth
OK: Link status Up
Speed 10GbE
MTU 1500
OK: TX nocache copy ‘off’

I did MTU 9000 on two servers but nothing changes.

Hello,

Thanks for writing us!
This issue is a bit more complex to debug at this community page.
I would ask you to open a case with NVIDIA Enterprise Support, there we can collect logs and debug this issue.

Thanks and have wonderful day!
Ilan.