Issues with SRP and unexplainable "ibping" behaviour.

Hello everyone,

I’m chasing a bit of assistance in troubleshooting SRP over an Infiniband setup which I have at home. Essentially I’m not seeing the disk I/O performance I was expecting between my SRP Initiator and Target and want to troubleshoot where the problem could be. I wanted to start at the Infiniband infrastructure and working up from there. If I can verify that my Infiniband is setup correctly and performing as it should, I can start to troubleshoot the additional technologies and protocols involved.

Some basic information first:

SRP Target: Oracle Solaris v11.1 Server with ZFS pools as LU (Logical Units).

SRP Initiator: VMware ESXi v5.5.

Mellanox MHGH28-XTC (MT25418) cards are being used in both the Infiniband devices above. A CX4 cable is used to directly connect between them.

Now to the best of my knowledge, the drivers, VIBs and configuration has all been done correctly and I’m at the point where my ESXi v5.5 can actually see the LU, mount it and I can store data on there. At this stage, it seems to be purely a performance issue which I’m trying to resolve.

Some CLI outputs below:

STORAGE-SERVER

STORAGE-SERVER:/# ibstat

CA ‘mlx4_0’

CA type: 0

Number of ports: 2

Firmware version: 2.9.1000

Hardware version: 160

Node GUID: 0x001a4bffff0c6214

System image GUID: 0x001a4bffff0c6217

Port 1:

State: Active

Physical state: LinkUp

Rate: 20

Base lid: 2

LMC: 0

SM lid: 1

Capability mask: 0x00000038

Port GUID: 0x001a4bffff0c6215

Link layer: IB

Port 2:

State: Down

Physical state: Polling

Rate: 10

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x00000038

Port GUID: 0x001a4bffff0c6216

Link layer: IB

VM-HYPER:

/opt/opensm/bin # ./ibstat

CA ‘mlx4_0’

CA type: MT25418

Number of ports: 2

Firmware version: 2.7.0

Hardware version: a0

Node GUID: 0x001a4bffff0cb178

System image GUID: 0x001a4bffff0cb17b

Port 1:

State: Active

Physical state: LinkUp

Rate: 20

Base lid: 1

LMC: 0

SM lid: 1

Capability mask: 0x0251086a

Port GUID: 0x001a4bffff0cb179

Link layer: InfiniBand

Port 2:

State: Down

Physical state: Polling

Rate: 8

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x0251086a

Port GUID: 0x001a4bffff0cb17a

Link layer: InfiniBand

The “LIDs” in the above outputs indicate that the SM (Subnet Manager) is working as far as I’m aware.

Question continued…

From the SRP target, I can see the other Infiniband host:

STORAGE-SERVER:/# ibhosts

Ca : 0x001a4bffff0cb178 ports 2 “****************** HCA-1”

Ca : 0x001a4bffff0c6214 ports 2 “MT25408 ConnectX Mellanox Technologies”

I thought I’d start with using the “ibping” utility to verify Infiniband connectivity. This is where I got some really strange results:

Firstly, I could not get the ibping daemon running on the SRP initiator (ESXi) at all. The command would execute, but then just return to the shell:

/opt/opensm/bin # ./ibping -S

/opt/opensm/bin #

So I tried to switch to running the ibping daemon on the SRP target (Oracle Solaris), which seemed to work as it should and it appeared to be awaiting some pings to come through. Great! Now going back to the SRP initiator, I ran the ibping utility with the LID of the SRP target. But it was unsuccessful:

/opt/opensm/bin # ./ibping -L 2

ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable

ibwarn: [3502756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)

ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable

ibwarn: [3502756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)

ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable

.

— (Lid 2) ibping statistics —

10 packets transmitted, 0 received, 100% packet loss, time 9360 ms

rtt min/avg/max = 0.000/0.000/0.000 ms

OK, let’s try the Port GUID of the SRP target instead of the LID:

/opt/opensm/bin # ./ibping -G 0x001a4bffff0c6215

ibwarn: [3504924] _do_madrpc: recv failed: Resource temporarily unavailable

ibwarn: [3504924] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)

ibwarn: [3504924] ib_path_query_via: sa call path_query failed

./ibping: iberror: failed: can’t resolve destination port 0x001a4bffff0c6215

I restarted the ibping daemon on the SRP target with 1 level of debugging, and re-ran the pings from the client (SRP initiator). I can see that the pings are actually reaching the SRP target and a reply is being sent:

STORAGE-SERVER:/# ibping -S -d

ibdebug: [11188] ibping_serv: starting to serve…

ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

ibwarn: [11188] mad_respond_via: dest Lid 1

ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

ibwarn: [11188] mad_respond_via: dest Lid 1

ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

ibwarn: [11188] mad_respond_via: dest Lid 1

ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

ibwarn: [11188] mad_respond_via: dest Lid 1

ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

ibwarn: [11188] mad_respond_via: dest Lid 1

ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

The strangest observation is yet to come however. If I run the ibping on the client with 2 levels of debug, I get a few replies in the final statistics output when the ibping is terminated (this does not work under single level of debugging in my experience):

/opt/opensm/bin # ./ibping -L -dd 2

.

ibdebug: [3508744] ibping: Ping…

ibwarn: [3508744] ib_vendor_call_via: route Lid 2 data 0x3ffcebc7aa0

ibwarn: [3508744] ib_vendor_call_via: class 0x132 method 0x1 attr 0x0 mod 0x0 datasz 216 off 40 res_ex 1

ibwarn: [3508744] mad_rpc_rmpp: rmpp (nil) data 0x3ffcebc7aa0

ibwarn: [3508744] umad_set_addr: umad 0x3ffcebc7570 dlid 2 dqp 1 sl 0, qkey 80010000

ibwarn: [3508744] _do_madrpc: >>> sending: len 256 pktsz 320

send buf

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0001 8001 0000 0002 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0132 0101 0000 0000 0000 0000 4343 c235

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 1405 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

ibwarn: [3508744] umad_send: fd 3 agentid 1 umad 0x3ffcebc7570 timeout 1000

ibwarn: [3508744] umad_recv: fd 3 umad 0x3ffcebc7170 timeout 1000

ibwarn: [3508744] umad_recv: mad received by agent 1 length 320

ibwarn: [3508744] _do_madrpc: rcv buf:

rcv buf

0132 0181 0000 0000 0000 00ac 4343 c234

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 1405 6763 2d73 746f 7261

6765 312e 6461 726b 7265 616c 6d2e 696e

7465 726e 616c 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

ibwarn: [3508744] umad_recv: fd 3 umad 0x3ffcebc7170 timeout 1000

ibwarn: [3508744] umad_recv: mad received by agent 1 length 320

ibwarn: [3508744] _do_madrpc: rcv buf:

rcv buf

0132 0181 0000 0000 0000 00ac 4343 c235

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 1405 6763 2d73 746f 7261

6765 312e 6461 726b 7265 616c 6d2e 696e

7465 726e 616c 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

ibwarn: [3508744] mad_rpc_rmpp: data offs 40 sz 216

rmpp mad data

6763 2d73 746f 7261 6765 312e 6461 726b

7265 616c 6d2e 696e 7465 726e 616c 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000

Pong from STORAGE-SERVER (Lid 2): time 7.394 ms

ibdebug: [3508744] report: out due signal 2

— STORAGE-SERVER (Lid 2) ibping statistics —

10 packets transmitted, 3 received, 70% packet loss, time 9556 ms

rtt min/avg/max = 7.394/12.335/15.344 ms

I’m stumped. Anyone have any ideas on what is going on or how to troubleshoot further?

Re: Issues with SRP and unexplainable “ibping” behaviour.

Actually, looking at the Level 2 debugs a bit further, it seems that the replies are indeed making their way back to the ibping client (ESXi), you can see this in the receive buffers and the hex dump, but the following message seem to indicate something is amiss on the ESXi server:

ibwarn: [3511788] _do_madrpc: recv failed: Resource temporarily unavailable

On a side note, I’m seeing a lot of references to the word “mad” in all the debugging information. I wonder is someone is hinting at something.

Re: Issues with SRP and unexplainable “ibping” behaviour.

And some additional information on the Mellanox VIBs installed on the ESXi 5.5 Server:

~ # esxcli software vib list | egrep Mellanox

net-ib-cm 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-core 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-ipoib 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-mad 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-sa 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-umad 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-mlx4-core 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-mlx4-ib 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

scsi-ib-srp 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

~ # esxcli software vib list | egrep opensm

ib-opensm 3.3.15 Intel VMwareAccepted 2013-12-22

Re: Issues with SRP and unexplainable “ibping” behaviour.

I’m using the latest firmware on the Solaris Server, but a slightly older one (compatibility reasons?) on the ESXi v5.5 Server:

STORAGE-SERVER:

STORAGE-SERVER:/# ibstat

Firmware version: 2.9.1000

VM-HYPER:

/opt/opensm/bin # ./ibstat

Firmware version: 2.7.0

Actually, looking at the Level 2 debugs a bit further, it seems that the replies are indeed making their way back to the ibping client (ESXi), you can see this in the receive buffers and the hex dump, but the following message seem to indicate something is amiss on the ESXi server:

ibwarn: [3511788] _do_madrpc: recv failed: Resource temporarily unavailable

On a side note, I’m seeing a lot of references to the word “mad” in all the debugging information. I wonder is someone is hinting at something.

And some additional information on the Mellanox VIBs installed on the ESXi 5.5 Server:

~ # esxcli software vib list | egrep Mellanox

net-ib-cm 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-core 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-ipoib 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-mad 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-sa 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-ib-umad 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-mlx4-core 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

net-mlx4-ib 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

scsi-ib-srp 1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported 2013-12-22

~ # esxcli software vib list | egrep opensm

ib-opensm 3.3.15 Intel VMwareAccepted 2013-12-22

Hi!

What the firmware of Mellanox MHGH28-XTC (MT25418) cards?

I think you must use firmware 2.9.1200.

I’m using the latest firmware on the Solaris Server, but a slightly older one (compatibility reasons?) on the ESXi v5.5 Server:

STORAGE-SERVER**:**

STORAGE-SERVER:/# ibstat

Firmware version: 2.9.1000

VM-HYPER:

/opt/opensm/bin # ./ibstat

Firmware version: 2.7.0

Hey,

Have you tried upgrading the Firmware version?

No. It’s not a problem of your storage’s HCA firmware 2.9.1000.

If your Hypervisor is ESXi 5.5 or above, i’ll make firmware 2.9.1200 for MHGH28 or MHGH29 CX-1 families.

I was confirmed successfully before that old HCA.

I’m so sorry for my late reply, because I was very busy for my works…:)

I’ll link to you firmware 2.9.1200 for MHGH28 CX-1 HCA.

This firmware can support Intel vt-d enabled system…:)

super old thread to the top…

@Jaehoon Choi​ - I’m looking for 2.9.1200 for mhgh29 cx1… you said you would link to it. hoping you have a copy still after 5 years!!!

or if you can tell me where I can get the .mlx file to make my own? I’m just not getting anywhere…

I have 2.9.1000 and esxi 6.7, freenas 11 and windows 10 all see the card and it works. just hearing speed is better with 2.9.1200…