I’m trying to install 22.11 HCP SDK with psm2 support. I have tried “./install --with-psm2” but this doesn’t seem to do the trick. Once installed there is no “MCA mtl: psm2” in the output from ompi_info. Thanks for any help.
Hi,
Your issue is not related to Mellanox Networking as far as I can tell. Can you please supply more information? I can then move this to the proper category.
Sorry, there were a lot of options; I’m not exactly sure how I ended up in Mellanox Networking. Basically, I’m trying to install NVIDIA HCP SDK with “MCA mtl: psm2” internode communication. I believe the default installation procedure does not include “MCA mtl: psm2” communication and I’m not sure how to enable it during install.
I am not really sure where this topic should live. I am moving it to the HPC Compilers category for now. Let me see if I can find someone to jump in.
Thanks; I think I just need to add “–enable-mca-static=mtl” somewhere when installing. I have tried installing HCP SDK as “./install --enable-mca-static=mtl” but it doesn’t seem to pass it to comm_libs / openmpi.
After a little more digging. I believe the problem is that openmpi within HPC-SDK is precompiled and does not have the proper internode communication enabled (mtl psm2). Not sure how to fit the problem…
Hi,
We do not have the ability to support PSM2 in the Open MPI 3.1.5 build of the HPC SDK, due to not having access to the requisite hardware here for build and test purposes. However, you are welcome to try building Open MPI yourself from source with the configuration you need. The nvhpc-nompi module files will bring the compilers into your environment without MPI, and then you can simply add your own MPI build to your environment on top of that by setting $PATH and related variables.
Hope this helps.
+chris
Ok, thanks. I already have a working version of openmpi. I guess I just need to figure out how to use nvhpc-nompi.
Well, if I load my version of openmpi (4.1) and nvhpc-nompi/22.11 and try compiling I get errors like:
unrecognized command-line option ‘-acc’
unrecognized command-line option ‘-gpu=cc60,cc70,cc80,cuda11.8’
do I need to rebuild openmpi with additional options?
Those look like errors coming from gcc or gfortran which don’t know these flags.
Likely when you built OpenMPI you configured it to use gcc/gfortran by default. I believe you can override the default compiler to use via the OMPI_CC, OMP_CXX, and OMPI_FC environment variables.
If the default config uses flags the NVHPC compilers don’t recognize, add the flag “-noswitcherror” and we’ll ignore them.
Of course, rebuilding OpenMPI configured with NVHPC works as well.
If you’re using the Fortran “mpi” module, then must rebuild OpenMPI using nvfortran given modules are not compatible between compilers.
-Mat
Thanks for trying to help but I’m still having issues… I have installed and loaded nvhpc (module load nvhpc-nompi), now I’m trying to install openMPI (3.1.5) with nvphc (22.11) as follows:
./configure CC=nvcc CXX=nvc++ FC=nvfortran
Configure seems to complete without issues; however, “make install” results in:
In file included from /opt/source/openmpi-3.1.5/opal/mca/event/libevent2022/libevent/evutil.h:37,
from ../../opal/mca/event/libevent2022/libevent/event.h:57,
from ../../opal/mca/event/libevent2022/libevent2022.h:58,
from ../../opal/mca/event/event.h:76,
from ../../opal/mca/pmix/pmix.h:24,
from proc.c:22:
/opt/source/openmpi-3.1.5/opal/mca/event/libevent2022/libevent/include/event2/util.h:126:2: error: #error "No way to define ev_uint64_t"
126 | #error "No way to define ev_uint64_t"
| ^~~~~
/opt/source/openmpi-3.1.5/opal/mca/event/libevent2022/libevent/include/event2/util.h:145:2: error: #error "No way to define ev_uint32_t"
145 | #error "No way to define ev_uint32_t"
| ^~~~~
/opt/source/openmpi-3.1.5/opal/mca/event/libevent2022/libevent/include/event2/util.h:164:2: error: #error "No way to define ev_uint16_t"
164 | #error "No way to define ev_uint16_t"
| ^~~~~
/opt/source/openmpi-3.1.5/opal/mca/event/libevent2022/libevent/include/event2/util.h:251:2: error: #error "No way to define SIZE_MAX"
251 | #error "No way to define SIZE_MAX"
| ^~~~~
make[3]: *** [Makefile:1987: proc.lo] Error 1