Upgrading OFED driver on a Linux cluster

Hello, I have a, hopefully, simple question regarding OFED on Linux systems. On our cluster we are using OFED v4.1* and we would like to upgrade to a later version. We have a lot of applications build with OpenMPI and Intel MPI. In the case of OpenMPI we have a lot of applications build OpenMPI v3* – that is build with the --with-verbs option.

I have experimented with OFED 5.1, but find that our legacy OpenMPI applications complaint about an undefined protocol. A new compile based on OpenMPI v4 with the option --with-ucx works okay, but that is by the by. The situation with Intel is even worse – the processes start, but run forever or at least until I kill the job.

When we install OFED v5.1, is there a special option that we can specify to enable our legacy OpenMPI applications to work or do we need to download/install the an LTS version of OFED? Please could someone advise me what to do.

Regards, David

Hello David,

Thank you for posting your question on the Mellanox Community.

The reason you are getting these errors from OpenMPI is that OFED 4.1 and OFED 5.1 use different libraries.

In newer versions of the Mellanox OFED we have switched from using our own Mellanox verbs to using the rdma core verbs. You can read about this change here: https://docs.mellanox.com/display/rdmacore50/Migration+to+RDMA-Core

If you wish to continue using the legacy verbs you can use the LTS version of the Mellanox OFED 4.9 which can be found under the LTS download tab on this page:

https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed

Thanks and regards,

Mellanox Technical Support