Designing for high availability

We are in the early planning stages for a high bandwidth interconnect between servers. The software stack will be OFED and GPFS on Linux. GPFS can use RDMA.

The idea is to have two IB switches and each server to connect to both. I’ve seen such configurations with normal linux bonding (mode=1, active/passive).

  • How does this works out if RDMA is used ?

  • Is there a possibility to have a active/active configuration at least for RDMA ?

Hi everyone. First, IPoIB isn’t RDMA, so it’s not the best choice. Also, as has been pointed out, IPoIB bonding is active/standby only. The multi-port behavior of InfiniBand is determined by the Upper Layer Protocol, so there’s no general answer about how a given protocol will utilize multiple IB ports from one server.

GPFS over RDMA can run active/active using multiple IB interfaces (multiple HCAs, and/or multiple ports per HCA). As a GPFS customer you should have access to documentation and support. I’m not a GPFS expert, but I found online GPFS assistance from IBM, e.g. I searched for ‘gpfs rdma config’ and found:…!%2Fwiki%2FGeneral%20Parallel%20File%20System%20%28GPFS%29%2Fpage%2FHow%20to%20verify%20IB%20RDMA%20is%20working&usg=AOvVaw0u-CSSR9bIKEd-EKA7p1cQ

If for some reason you can’t find what you need, let me know-- I have some colleagues who can assist.


If you’re planning to use GPFS over RDMA you’ll most likely have to use IPoIB as the upper layer interface. IPoIB can use the standard linux bonding driver with an active/passive configuration, as of now I’m not familiar with IPoIB working as active/active.

This is a very common configuration with GPFS.


Did you ever find a solution like this for native infiniband switches?

I’am familiar with Infiniband, GPFS and RDMA is a single switch environment. This is straightforward and you can define your IPoIB interfaces as bond to fail over when individual links fail. I have also seen configurations where two IB switches are used and bonding to fail over between the two, but in a active/passive arrangement. In both cases the resulting configuration is similar to an analogue Ethernet configuration.

I’m looking for a configuration similar to two Ethernet switches, stacked together, with the hosts using LACP to manage the links. This allows to use the full bandwidth of the links (link aggregation) and provides fail-over in case of switch failure.