Performance test with RoCEv2

Hi,

CPU : Intel
Card : ConnectX-5 EN
O/S : Ubuntu 22.04(64-bit)
Driver : MLNX_OFED_LINUX-5.8-1.1.2.1-ubuntu22.04-x86_64

I am testing performance with RoCEv2. The link speed is 100Gbps.

Settings for enp202s0f0np0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
40000baseKR4/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
50000baseCR2/Full
50000baseKR2/Full
100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: None
Speed: 100000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 0
Transceiver: internal
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000004 (4)
link
Link detected: yes

This is a log from “show_gids” command.

=== PC1 ===
mlx5_2 1 2 0000:0000:0000:0000:0000:ffff:c0a8:6714 192.168.103.20 v1 enp202s0f0np0
mlx5_2 1 3 0000:0000:0000:0000:0000:ffff:c0a8:6714 192.168.103.20 v2 enp202s0f0np0

=== PC2 ===
mlx5_2 1 2 0000:0000:0000:0000:0000:ffff:c0a8:671e 192.168.103.30 v1 enp202s0f0np0
mlx5_2 1 3 0000:0000:0000:0000:0000:ffff:c0a8:671e 192.168.103.30 v2 enp202s0f0np0

I tested the performance by running “ibv_rc_pingpong” as shown below, and the performance was about 13Gbps. Need other testing options to see 100 Gbps performance?

root@pc1:~# ibv_rc_pingpong -d mlx5_2 -g 3
local address: LID 0x0000, QPN 0x00010d, PSN 0x45844e, GID ::ffff:192.168.103.20
remote address: LID 0x0000, QPN 0x00004b, PSN 0x58f27e, GID ::ffff:192.168.103.30
8192000 bytes in 0.00 seconds = 13487.55 Mbit/sec
1000 iters in 0.00 seconds = 4.86 usec/iter

root@pc2:~# ibv_rc_pingpong -d mlx5_2 -g 3 192.168.103.20
local address: LID 0x0000, QPN 0x00004b, PSN 0x58f27e, GID ::ffff:192.168.103.30
remote address: LID 0x0000, QPN 0x00010d, PSN 0x45844e, GID ::ffff:192.168.103.20
8192000 bytes in 0.00 seconds = 14362.48 Mbit/sec
1000 iters in 0.00 seconds = 4.56 usec/iter

Ultimately my goal is to get maximum transmit/receive throughput between PC1 and PC2 using RoCE.
In order to utilize all of the 100Gbps B/W, sending and receiving should be done in multi-thread, but I want to implement it in single-thread.

Use perftest package (ib_write_bw etc.)

Pin the application threads to the NUMA closest to the card.

Disable C-states to prevent CPU from slowing down.

Thank you for your comment.

Hi

Please go through the performance tunning guide.

https://enterprise-support.nvidia.com/s/article/performance-tuning-for-mellanox-adapters