Extremly slow iboip performance, nearly as slow as gbit ethernet.

Hello there,

I am using Mellanox Infiniband for many years now and always had great performance.

For a small, but not completely unimportant system I installed two infiniband cards connecting them directly and set up ipoib (rdma is not an option in this case).

Unfortunately the perfomance is very very low. I know that ipoib is not the best of protocols to use, but iperf is reaching 1.2gbit/s, which is way way lower than any expected performance.

Some information about the systems used:

System1 running debian:

‘’’ uname -a

Linux space 4.15.18-12-pve #1 SMP PVE 4.15.18-35 (Wed, 13 Mar 2019 08:24:42 +0100) x86_64 GNU/Linux

ibstat

CA ‘mlx4_0’

CA type: MT4099

Number of ports: 1

Firmware version: 2.35.5100

Hardware version: 1

Node GUID: 0x7cfe900300b1c470

System image GUID: 0x7cfe900300b1c473

Port 1:

State: Active

Physical state: LinkUp

Rate: 20

Base lid: 2

LMC: 0

SM lid: 1

Capability mask: 0x02514868

Port GUID: 0x7cfe900300b1c471

Link layer: InfiniBand

lspci | grep Mellanox

83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

‘’’

system 2 running arch:

‘’’

uname -a

Linux desktop 5.0.7-arch1-1-ARCH #1 SMP PREEMPT Mon Apr 8 10:37:08 UTC 2019 x86_64 GNU/Linux

ibstat

CA ‘mthca0’

CA type: MT25204

Number of ports: 1

Firmware version: 1.2.0

Hardware version: a0

Node GUID: 0x0002c9020020d7e0

System image GUID: 0x0002c9020020d7e3

Port 1:

State: Active

Physical state: LinkUp

Rate: 20

Base lid: 1

LMC: 0

SM lid: 1

Capability mask: 0x02590a6a

Port GUID: 0x0002c9020020d7e1

Link layer: InfiniBand

lspci | grep Mellanox

02:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)

‘’’

I know that system 2 is using some old infiniband card, but is this performace really what I can expect?

ibping -c 5 -G 0x7cfe900300b1c471

Pong from space.(none) (Lid 2): time 0.250 ms

Pong from space.(none) (Lid 2): time 0.334 ms

Pong from space.(none) (Lid 2): time 0.355 ms

Pong from space.(none) (Lid 2): time 0.358 ms

Pong from space.(none) (Lid 2): time 0.331 ms

ibping looks pretty bad too.

ibdiagnet -lw 4x -ls 10 -pm -pc

Loading IBDIAGNET from: /usr/lib/ibdiagnet1.5.7

-W- Topology file is not specified.

Reports regarding cluster links will use direct routes.

Loading IBDM from: /usr/lib/ibdm1.5.7

-I- Using port 1 as the local port.

-I- Discovering … 2 nodes (0 Switches & 2 CA-s) discovered.

-I---------------------------------------------------

-I- Bad Guids/LIDs Info

-I---------------------------------------------------

-I- No bad Guids were found

-I---------------------------------------------------

-I- Links With Logical State = INIT

-I---------------------------------------------------

-I- No bad Links (with logical state = INIT) were found

-I---------------------------------------------------

-I- General Device Info

-I---------------------------------------------------

-I---------------------------------------------------

-I- PM Counters Info

-I---------------------------------------------------

-I- No illegal PM counters values were found

-I---------------------------------------------------

-I- Links With links width != 4x (as set by -lw option)

-I---------------------------------------------------

-I- No unmatched Links (with width != 4x) were found

-I---------------------------------------------------

-I- Links With links speed != 10 (as set by -ls option)

-I---------------------------------------------------

-W- link with SPD=5 found at direct path “1”

From: a HCA PortGUID=0x0002c9020020d7e1 Port=“desktop/P1”

To: a HCA PortGUID=0x7cfe900300b1c471 Port=“MT25408/P1”

-I---------------------------------------------------

-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)

-I---------------------------------------------------

-I- PKey:0x7fff Hosts:2 full:2 limited:0

-I---------------------------------------------------

-I- IPoIB Subnets Check

-I---------------------------------------------------

-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00

-W- Suboptimal rate for group. Lowest member rate:20Gbps > group-rate:10Gbps

-I---------------------------------------------------

-I- Bad Links Info

-I- No bad link were found

-I---------------------------------------------------


-I- Stages Status Report:

STAGE Errors Warnings

Bad GUIDs/LIDs Check 0 0

Link State Active Check 0 0

General Devices Info Report 0 0

Performance Counters Report 0 0

Specific Link Width Check 0 0

Specific Link Speed Check 0 1

Partitions Check 0 0

IPoIB Subnets Check 0 1

Please see /tmp/ibdiagnet.log for complete log


-I- Done. Run time was 0 seconds.

Well, this has to be part of the answer. I already switched out the cable with no success or change.

Hi Marc,

Please note that we cannot support this adapter since this product reached its End of Life date, Mellanox don’t offer any support or repair service for products that has reached their End of Life date.

You can refer to the following page for more information:

http://www.mellanox.com/related-docs/eol/eol_hca_general_offering_8_25_2011_B.pdf

Thanks,

Samer