HPCX-MPI runtime error

I am trying to use the HPCX-OPENMPI on the new [ECMWF ATOS supercomputer]

We use a Fortran based climate model which runs fine with the Intel-MPI. However, when I try to use the HPCX-OPENMPI, I get segfaults in the most benign part of the code. The code simply does not run on multiple nodes, MPI_Bcast operation fails with a non-zero exit code. I have checked our code multiple times and different compilers, it works fine.
Only with HPCX-OPENMPI it has an issue.

I saw on the forum that there are other things that could be done, for example : Mellanox Interconnect Community

However, this option is not available in the HPCX-OPENMPI version 2.10.0 which is available on the supercomputer. The Intel compiler that HPCX is built with is 2021.4.0.

I just want to know if there are any basic things that need to be used while compiling the code with HPCX-MPI? Or are there any runtime arguments that need to be passed with HPCX-MPI?

To build your app with HPC-X. You need install full OFED of NVIDIA and install HPC-X. Then use mpicc of HPC-X.

Please follow below,

https://docs.nvidia.com/networking/display/hpcxv212/Installing+and+Loading+HPC-X#InstallingandLoadingHPCX-InstallingHPC-X

https://docs.nvidia.com/networking/display/hpcxv212/Installing+and+Loading+HPC-X#InstallingandLoadingHPCX-BuildingandRunningApplicationswithHPC-X

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.