Building OpenMPI 5 with NVHPC

I am attempting to build cuda-aware OpenMPI 5.0.6 with NVHPC 25.1. I am using the configure line below:

…/configure CC=nvc CXX=nvc++ FC=nvfortran CFLAGS=“-fPIC -O” CXXFLAGS=“-fPIC -O” FCFLAGS=“-fPIC -O” FCFLAGS=“-fPIC -O” CFLAGS=“-fPIC -O” CXXFLAGS=“-fPIC -O” --prefix=/project/6024112/ntandon/software/StdEnv2023_nvhpc25.1/openmpi-5.0.6 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --with-slurm --disable-wrapper-runpath --disable-wrapper-rpath --with-show-load-errors=no --enable-mpi1-compatibility --enable-mca-dso=common-ofi,common-ucx,accelerator-cuda,atomic-ucx,btl-ofi,btl-smcuda,btl-uct,coll-ucc,fs-lustre,mtl-ofi,mtl-psm2,osc-ucx,pml-ucx,rcache-gpusm,rcache-rgpusm,scoll-ucc,spml-ucx,sshmem-ucx --with-cuda=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/cudacore/12.6.2 --with-cuda-libdir=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/cudacore/12.6.2/lib64/stubs --enable-shared --with-hwloc=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/hwloc/2.10.0 --with-libevent=/cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr --with-ofi=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/libfabric/1.21.0 --with-pmix=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/pmix/5.0.2 --with-ucx=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/ucx/1.16.0 --with-ucc=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/ucc/1.3.0 --with-prrte=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/prrte/3.0.5

Configure completes successfully, but I get a mess of compilation errors like “identifier X is undefined” and “struct X has no field Y”.

I was able to build OpenMPI 4.x successfully using essentially the same options as above. So I’m not sure what the issue is with OpenMPI 5, and I would appreciate guidance. Thanks!

Hi Neil,

Could you paste the exact errors you are seeing in here? I want to show this to another engineer within NVIDIA to see if it is a known issue.

Thanks in advance,

+chris

Nevermind, I was able to build OpenMPI 5.0.7 with NVHPC 25.1 using easybuild, so that should be fine for my needs. Thanks!

Hello @neil987 , I am trying the same, but with spack. Have you been able to use ucc’s nccl backend over infiniband with that installation, if you tested it?

@l.bellentani yes, the system I am using uses infiniband, and I have completed GPU test runs successfully using openmpi 5.0.7. However, the runs are quite slow for reasons that I am still investigating..

In my case, the default spack installation missed some libraries for IB, so by default it was using TCP/socket, and also after setting the proper installation I had to set manually some env vars for transport via UCX/UCC, otherwise it was not using CUDA-aware kind of transport by default. However I have not been able to activate NCCL via UCC. I am following this link: https://x-dev.pages.jsc.fz-juelich.de/2023/07/18/mpi-ucc-nccl.html#enabling-ucc-in-openmpi. May I ask you which easybuild recipe are you using?

I have attached my easybuild recipe, but it might need tailoring for your particular system. Installing should just be a matter of unpacking the attached files to your home directory and executing eb ./OpenMPI-5.0.7-NVHPC-25.1.eb

openmpi5_nvhpc25_eb.zip (3.8 KB)