Mixing OFED 1.5.3 and 2.2 in the same network?

Hi all,

We have 2 clusters using infiniband, as follows:

Computing cluster 1:

  • 27 nodes
  • IBM bladecenter
  • CentOS 6.2
  • Each node has 1x MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
  • Firmware version: 2.9.1100
  • OFED 1.5.3-3.0.0 (from the installer MLNX_OFED_LINUX-1.5.3-3.0.0-rhel6.2-x86_64, with addkernel and all that )
  • ib_ipoib

GPFS server cluster:

  • 4 nodes
  • Dell Power edge R720
  • RHEL 6.3
  • Each node has 1x MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
  • Firmware version: 2.9.1000
  • OFED 1.5.3-3.1.0 (from the installer MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 )
  • ib_ipoib

Our network topology consists of:

  • 1x 36x port Mellanox FabricIT IS5030/U1 connected to:
    • 4x GPFS servers - 1x port each
    • 2x Voltaire 40Gb InfiniBand Switch Module. The bladecenter’s switches - 2x ports each.
    • 4x other gpfs clients
  • 2x Voltaire 40Gb InfiniBand Switch Module.
    • 13x nodes each. Connected internally tho the HCA’s through the chassis backplane.

Both clusters have been running for more than a year. The original GPFS + Infiniband installation was done by IBM techs. We merely copied/adapted it when we moved the servers to Dell machines. We never did much research on Infiniband. Mostly went with what we found already installed/configured.

We run GPFS over infiniband (that’s why I think we have ib_ipoib there, to use ip addresses to name nodes). The only infiniband parameter we ever modified was adding this to /etc/modprobe.d/mlx4_en.conf to all nodes of both clusters:

options mlx4_core pfctx=0 pfcrx=0 log_num_mtt=20 log_mtts_per_seg=4

Performance tests (ib_read/write_bw/lat) report ~3250MB/sec and ~2.5 usec (I cannot show actual numbers because there is currently heavy traffic messing with the numbers).

GPFS performance showed single-read/write performance (dd) of 2.0-2.5GB/s, and a global multi-node bandwidth of 6~10GB/s.

Having these numbers in mind, we consider that the system (and the infiniband network) are working fine. (Aren’t they?)

Now, we are planning to build a new computing cluster (and/or rebuild the current one) and we started doing some tests with a couple of computing nodes.

We are moving to CentOS 6.5 and we are forced to move to OFED 2.2 (MLNX_OFED_LINUX-2.2-1.0.1-rhel6.5), as we tried to install 1.5.3 (MLNX_OFED_LINUX-1.5.3-4.0.42-rhel6.3) but the mlnx_add_kernel_support script only supports up to rhel6.4. Even if cheating, the compilation fails due to some missing includes. Thus, we moved on and installed the test cluster with CentOS 6.5 and MLNX_OFED_LINUX-2.2-1.0.1.

We have had a couple of problems, that we more or less determined after asking in the openfabrics discussion list:

  • perftest handshake mechanism changed from 1.5 to 2.2 and cannot run tests between the new and the old cluster.

We can deal with this. The performance tests between the two ofed-2.2 nodes seemed ok.

  • Loading the ib_ipoib module under ofed-2.2 changes the mac address of ib0.

This wouldn’t be a problem if it weren’t for the ofed installer deleting CentOS’s rdma-3.10-3.el6.noarch rpm. This rpm contains the ifup-ib and ifdown-ib scripts able to initialize the infiniband interfaces ignoring mac address changes. We can get around this copying back the two “old” scripts after ofed’s installation in the kickstart graph.

Even after all these problems, we have been able to join these nodes as GPFS clients with “normal” performance. But, after dealing with this, we are wondering what to do with our infiniband-gpfs network.

Can we keep the GPFS servers on ofed 1.5.3 and move the (new) clients to 2.2? Should we try to update everything to 2.2? Will there appear new problems when upgrading the servers in production? Should we keel everything in 1.53? Should we use the community ofed? (CentOS 6.5 default infiniband installation “works”)

Any other criticism to our system is welcome.

Thanks in advance,

Txema

PS: All these doubts are due to one of our techs adding a client to the gpfs with a “poorly installed” infiniband, that stalled all the whole infiniband and GPFS traffic until we removed the node. So we are afraid of touching anything on that network.

So I have been delving into this and I’ve found the following:

I have managed to compile ofed 1.5.3 in CentOS 6.5 after following this post instructions (el5.10 ofed build problem. Infrastructure & Networking - NVIDIA Developer Forums )

My infiniband nodes have these cards:

  • computing nodes:
    • Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
    • CA type: MT26428- Firmware version: 2.9.1100
  • GPFS servers:
    • Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
    • CA type: MT26428
    • Firmware version: 2.9.1000
  • Other GPFS clients:
    • Mellanox Technologies MT27500 Family [ConnectX-3]
    • CA type: MT4099
    • Firmware version: 2.10.300 (upgradable up to 2.11.500)

And, according to the release notes:

2.2

MLNX_OFED Rev 2.2-1.0.1 supports the following Mellanox network adapter cards:

• Connect-IB® (Rev 10.10.3000 and above)

• ConnectX®-3 Pro (Rev 2.31.5050 and above)

• ConnectX®-3 (Rev 2.31.5050 and above)

• ConnectX®-2 (Rev 2.9.1200 and above)

ConnectX®-2 does not support all the new functionality of MLNX_OFED 2.0.3-XXX. For the complete list of the supported features per HCA, please refer to the MLNX_OFED User Manual.

1.5.3

Mellanox supports the following adapters with MLNX_OFED_LINUX 1.5.3:

  • Mellanox Technologies HCAs (SDR and DDR Modes are Supported):

  • ConnectX-2 / ConnectX-2 EN (fw-ConnectX2 Rev 2.9.1000)

  • ConnectX-3 VPI / ConnectX-3 EN (fw-4099 Rev 2.11.0500)

So, it seems that none of my current cards are compatible with ofed 2.2, and only some of my servers are compatible with the latest 1.5.3.

What should I do? Should I keep everything at 1.5.3? Should I update everything to 2.2? Would I win anything for moving to 2.2?

And when buying new nodes (with new cards)? Should we install them with 2.2? Should we downgrade them to 1.5.3? Can both 1.5.3 and 2.2 coexist in the same network? Will 1.5.3 last? Is it already “dead”?

Thanks in advance,

Txema

MLNX OFED 2.2 is preferred and it has significant improvements in both stability and performance of IPoIB. However if you have older Linux versions that are not supported by 2.2 (like RHEL 5.x kernels) then you have to use 1.5.3 on those systems. Both versions can coexist, just be aware that they have different default settings for IPoIB modes – it is CM for 1.5.3 and UD for 2.2. Links between 1.5.3 and 2.2 systems (with default settings) will negotiate to UD so performance will be limited by performance of UD on 1.5.3 which is lower than CM on 1.5.3. Setting 2.2 systems to CM will give you better performance on links with 1.5.3 systems. If both ends are 2.2 preferred mode is UD (default): although both UD and CM have good performance on 2.2, UD has a bit higher performance and better scales with a big number of connections. On firmware: MLNX OFED installation by default will do firmware upgrade on your Mellanox branded cards (OEM cards by HP or others will not be upgraded). You can check card PSID with ibv_devinfo (see Firmware Support and Downloads - Identifying Adapter Cards Firmware Support and Downloads - Identifying Adapter Cards ) – for Mellanox cards PSID always starts with “MT”, for example: MT_1100110019. With actual PSID you can always find proper fw at NVIDIA Networking Firmaware Downloads NVIDIA Networking Firmaware Downloads but perhaps easiest is to allow MLNX OFED installation script to do the upgrades to versions required. Once f/w is upgraded – your ConnecX2 and ConnectX3 cards will work just fine.

Thanks Andre,

My kernels are all pretty new and should all accept ofed 2.2 (CentOS 6.3 and 6.5, and also CentOS 6.2 that will be soon updated to 6.5). All supported according 2.2-1.0.1 release notes.

My problem comes with my cards. They are all 2~3 years old and there seems to be no more firmware updates for them. The mellanox ofed installer says they are up to date or cannot be updated with newer firmwares:

  • Compute:

Image type: ConnectX

FW Version: 2.9.1100

Device ID: 26428

PSID: IBM08B0130009

Mellanox equivalent model: No match

max firm connectX2 = 2.9.1000

http://old.mellanox.com/content/pages.php?pg=firmware_table_IBM http://old.mellanox.com/content/pages.php?pg=firmware_table_IBM

Latest release: 08 aug 11

  • Dell GPFS servers:

Image type: ConnectX

FW Version: 2.9.1000

Device ID: 26428

PSID: MT_0D90110009

Mellanox equivalent model:

Single 4X IB QDR Port, PCIe Gen2 x8, Tall Bracket, RoHS-R6 HCA Card, QSFP Connector

Max firmware available: 2.9.1000

Latest release:09 June 11

  • IBM spare nodes

Image type: FS2

FW Version: 2.11.500

Device ID: 4099

PSID: IBM1020110028

Mellanox equivalent model:

MT_1020110028

ConnectX-3 VPI adapter card; dual-port QSFP; FDR10 IB (40Gb/s) and 10GigE; PCIe3.0x8 8GT/s; RoHS R6 PCI DevID: 4099

Max firmware available: 2.31.5050

http://old.mellanox.com/content/pages.php?pg=firmware_table_IBM http://old.mellanox.com/content/pages.php?pg=firmware_table_IBM

Max firmware available for the IBM OEM version: 2.11.0500

Latest release:15 Nov 12

While 2.2 release notes require:

  • ConnectX-3: 2.31.5050 or above

  • ConnectX-2: 2.9.1200 or above

But 1.5.3 requirements are fine:

  • ConnectX-2 / ConnectX-2 EN (fw-ConnectX2 Rev 2.9.1000)

  • ConnectX-3 VPI / ConnectX-3 EN (fw-4099 Rev 2.11.0500)

So, I suppose I should keep everything with 1.5.3, and install 2.2 in new nodes but setting them to CM while old nodes exist.

Thanks,

Txema

I look this old theme and have some questions.

Anyone use at this time IBM BladeCenter H with Infiniband 40gb switch 46m6005?

What level firmware on the you card and on your BladeCenter 40Gb InfiniBand switch?