Is this the best our FDR adapters can do?

fangchin · August 18, 2016, 3:41am

We have a small test setup illustrated below. I have done some ib_write_bw tests. Got “decent” numbers, but not as fast as I anticipated. First, some background of the setup:

Two 1U storage servers each has a EDR HCA MCX455A-ECAT. The other four each has a ConnectX-3 VPI FDR 40/50Gb/s HCA mezz http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX3_VPI_Card_Dell.pdf OEMed by Mellanox for Dell. The firmware version: 2.33.5040. This is not the latest (2.36.5000 according to hca_self_test.ofed) but I am new to IB, and still getting up to speed with updating using Mellanox’s firmware tools. The EDR HCA firmware has been updated when the MLNX_OFED was installed.

All servers:

CPU: 2 x Intel E5-2620v3 2.4Ghz 6 core/12 HT

RAM: 8 x 16GiB DDR4 1866Mhz DIMMs

OS: CentOS 7.2 Linux … 3.10.0-327.28.2.el7.x86_64 #1 SMP Wed Aug 3 11:11:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

OFED: MLNX_OFED_LINUX-3.3-1.0.4.0 (OFED-3.3-1.0.4)

A typical ib_write_bw test:

Server:

[root@fs00 ~]# ib_write_bw -R

Waiting for client to connect… *

RDMA_Write BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : ON

Data ex. method : rdma_cm

Waiting for client rdma_cm QP to connect

Please run the same command with the IB/RoCE interface IP

local address: LID 0x03 QPN 0x01aa PSN 0x23156

remote address: LID 0x05 QPN 0x4024a PSN 0x28cd2e

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 5000 6082.15 6081.07 0.097297

Client:

[root@sc2u0n0 ~]# ib_write_bw -d mlx4_0 -R 192.168.111.150

RDMA_Write BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : ON

Data ex. method : rdma_cm

local address: LID 0x05 QPN 0x4024a PSN 0x28cd2e

remote address: LID 0x03 QPN 0x01aa PSN 0x23156

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 5000 6082.15 6081.07 0.097297

Now 6082MB/s ~ 48.65Gbps. Even taking into account of the 64/66 encoding overhead, over 50+Gbps should be the case, or is this the best the setup can do? Or is there anything I can do to push up the speed further?

Look forward to hearing the experience and observations from the experienced camp! Thanks!

fangchin · August 18, 2016, 3:52pm

Thanks for sharing your experience. I did the following:

[root@sc2u0n0 ~]# dmidecode |grep PCI

Designation: PCIe Slot 1

Type: x8 PCI Express 3 x16

Designation: PCIe Slot 3

Type: x8 PCI Express 3

lspci -vv

[…]

02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

[…]

LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited

ClockPM- Surprise- LLActRep- BwNot-

So, the theoretical speed should be 8Gbps/lane x 8 lane x 128b/130b PCI Express - Wikipedia = 63 Gbps. In fact, we just did a fio sweep using fio-2.12. The read is quite reasonable. We are now investigating why the write is so low.

A. Read test results

Chunk size = 2 MiB
Num. Jobs = 32
IO Depth = 128
File size = 500 GiB
Test time = 360 seconds
ModeSpeed, GbpsIOPSpsync, direct47.772986psync, buffered24.491530libaio, direct49.17

3073

B. Write test results

Chunk size = 2 MiB
Num. Jobs = 32
IO Depth = 128
File size = 500 GiB
Test time = 360 seconds
ModeSpeed, GbpsIOPSpsync, direct24.141509psync, buffered9.32583libaio, direct22.511407

fangchin · August 18, 2016, 7:32pm

I think I have the answer now. It’s due to the confusion caused by the prevalent and inconsistent use of MB and MiB out there in different software applications.

When I ran ib_write_bw with the --report_gbits flag, I did see over 50+ Gbps. That got me curious, so I assumed the MB/s output to be actually MiB/s, then 6028MiB/s = 51.02Gbps, as anticipated.

praetzel · August 18, 2016, 1:22pm

One thing to keep in mind is that you’ll hit the bandwidth of the PCIe bus.

I’ve not used the ib_write test myself - but I’m fairly sure that it’s not actually handling data - just accepting it and tossing it away so it’s going to be a theoretical maximum.

In real life situations that bus is going to be handling all data in/out of the CPU and for my oldest motherboards that maxes out at 25Gb/s - which is what I hit with fio tests on QDR links. I’ve heard that with PCIe gen 3 you’ll get up to 35Gb/s.

Generally whenever newer networking tech rolls out there is nothing that a single computer can do to saturate the link - unless it’s pushing junk data and the only way to really max it out is for switch-switch (hardware to hardware) traffic.

Of course using IPoIB an anything other than native IB traffic is going to cost you performance. In my case of NFS with IPoIB (with or without RDMA) I quickly slam into the bandwidth of my SSDs. The only exception I’ll have is the Oracle dB where the low latency is what I’m after as the database is small enough to fit in RAM.

Topic		Replies	Views
ConnectX-7 RDMA write_bw does not meet performence expectation InfiniBand/VPI Adapter Cards	0	7	August 8, 2025
Issues with ConnnectX-6 Throughput Under Infiniband InfiniBand/VPI Adapter Cards	7	1161	November 16, 2023
Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot InfiniBand/VPI Adapter Cards iterations , bytes	3	730	July 20, 2018
I have a MCX354A-FCBT installed on a PCI-3.0 slot of a SM X9DRi-LN4F . Ho do I make sure that FDR infinband data rate full speed on reads and writes. InfiniBand/VPI Adapter Cards	2	281	April 11, 2013
40Gb/s IPoIB only gives 5Gb/s real throughput?! InfiniBand/VPI Adapter Cards iterations , bytes , qp	9	1763	December 8, 2016
Low Bandwith with Connect6X Adapters InfiniBand/VPI Adapter Cards	0	113	November 6, 2024
Infiniband performance tuning InfiniBand/VPI Adapter Cards iterations , bytes	2	1489	June 7, 2017
Theortical Bandwidth and Latency of NIC InfiniBand/VPI Adapter Cards	3	1064	November 20, 2023
ib_write_bw does not go beyond 20G/s when using average packet size 512 Adapters and Cables iterations , bytes	3	1233	September 30, 2021
IPOIB and RDMA verification over InfiniBand InfiniBand/VPI Adapter Cards	1	581	October 27, 2018

Is this the best our FDR adapters can do?

Related topics