Ib_write_bw does not work on NICs with different cidr

question

I have two machines,each with two cx6 nics.
Nics in the same network segment can communicate, but nics in different network segments cannot communicate.

env

machine1


root@:~$ rdma link show
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev ens105f0np0
link mlx5_1/1 state ACTIVE physical_state LINK_UP netdev ens105f1np1

core@:~$  ip a | grep -e ens105f0np0 -e ens105f1np1
8: ens105f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.16.255.10/24 brd 10.16.255.255 scope global ens105f0np0
9: ens105f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.22.1.10/24 brd 10.22.1.255 scope global ens105f1np1

machine2

root@:~$ rdma link show
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev ens105f0np0
link mlx5_1/1 state ACTIVE physical_state LINK_UP netdev ens105f1np1

core@:~$ ip a | grep -e ens105f0np0 -e ens105f1np1
8: ens105f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.16.255.11/24 brd 10.16.255.255 scope global ens105f0np0
9: ens105f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.22.1.11/24 brd 10.22.1.255 scope global ens105f1np1

Scenario

same cidr is normal

core@:~$ ib_write_bw -d mlx5_0

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : ON           Using DDP      : OFF
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x00c7 PSN 0x627216 RKey 0x203d00 VAddr 0x007f6f820ac000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:11
 remote address: LID 0000 QPN 0x024c PSN 0x50d5db RKey 0x203d00 VAddr 0x007f9f133c7000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:10
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 65536      5000             2758.12            2758.08              0.044129
---------------------------------------------------------------------------------------

core@:~$ ib_write_bw 10.16.255.11 -d mlx5_0
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : ON           Using DDP      : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x024c PSN 0x50d5db RKey 0x203d00 VAddr 0x007f9f133c7000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:10
 remote address: LID 0000 QPN 0x00c7 PSN 0x627216 RKey 0x203d00 VAddr 0x007f6f820ac000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:11
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 800.000000 != 3000.000000. CPU Frequency is not max.
 65536      5000             2758.12            2758.08              0.044129
---------------------------------------------------------------------------------------

different cidr is abnormal

core@:~$ ib_write_bw -d mlx5_0

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : ON           Using DDP      : OFF
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x00c8 PSN 0xf8ea40 RKey 0x203d00 VAddr 0x007fc295bcc000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:11
 remote address: LID 0000 QPN 0x01bf PSN 0xa7999c RKey 0x23fd00 VAddr 0x007f33b8993000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:22:01:10
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdma_cm
 Failed to exchange data between server and clients

core@:~$ ib_write_bw 10.16.255.11 -d mlx5_1
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : ON           Using DDP      : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x01bf PSN 0xa7999c RKey 0x23fd00 VAddr 0x007f33b8993000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:22:01:10
 remote address: LID 0000 QPN 0x00c8 PSN 0xf8ea40 RKey 0x203d00 VAddr 0x007fc295bcc000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:11
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 Completion with error at client
 Failed status 12: wr_id 0 syndrom 0x81
scnt=128, ccnt=0
 Failed to complete run_iter_bw function successfully

different cidr with “-R” is normal

core@:~$ ib_write_bw -d mlx5_0 -R

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : ON           Using DDP      : OFF
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 Waiting for client rdma_cm QP to connect
 Please run the same command with the IB/RoCE interface IP
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x00ca PSN 0x7daa14
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:11
 remote address: LID 0000 QPN 0x024e PSN 0x134574
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:10
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 65536      5000             2745.92            2745.92              0.043935
---------------------------------------------------------------------------------------

core@:~$ ib_write_bw 10.16.255.11 -d mlx5_1 -R
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : ON           Using DDP      : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x024e PSN 0x134574
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:10
 remote address: LID 0000 QPN 0x00ca PSN 0x7daa14
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:255:11
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 800.000000 != 2001.000000. CPU Frequency is not max.
 65536      5000             2745.92            2745.92              0.043935
---------------------------------------------------------------------------------------

Hello zejia.lu,

Welcome, and thank you for posting your inquiry to the NVIDIA Developer Forums.

The behavior you’re seeing is expected.
The behavior occurs because of how GID (Global Identifier) resolution works in RoCE:

  • Without -R (for rdma_cm), the tool uses Ethernet-based data exchange which requires NICs to be in the same subnet.

  • The GID index being used (3) contains subnet-specific information. (show_gids script for more indices).

  • When trying to communicate across different subnets without RDMA CM, GID resolution fails.

  1. When using ib_write_bw between NICs in different subnets:

    • Without -R: Communication fails because basic Ethernet-based data exchange cannot resolve GIDs across subnets
    • With -R: Communication works because it properly uses RDMA CM for connection establishment
  2. For your use case, you should always use the -R flag when:

    • Communicating between NICs in different subnets.
    • Using RoCE across different network segments.
    • You need proper routing between different CIDRs.
  3. This is expected behavior because:

    • RDMA CM (-R flag) provides proper connection management across subnets
    • It handles the necessary address resolution and routing
    • It’s the recommended method for production environments with complex networking

Your test results showing successful communication with -R confirm that everything is indeed working correctly.

Best,
NVIDIA Enterprise Experience

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.