OpenMPI+PBS

Hi,

I am having some trouble running multi-GPU OpenACC codes on the new V100 nodes on NASA’s Pleiades system.

If I use the OpenMPI included with the PGI compiler, I can only run jobs that use multiple GPUs on a single node. If I try to run on multiple nodes, the openmpi does not “see” the correct topology even if I directly send it a hostfile. I found this: https://www.open-mpi.org/faq/?category=tm
and when I test the command out, the “tm” is not listed (which is needed for PBS).

The openmpi included does include SLURM support (which all other HPC systems I use utilize).

Would it be possible for future PGI releases to configure openmpi to include PBS (“tm”) by default?

Or alternatively, is there a flag or config file edit I can do when installing PGI with the included openmpi to enable PBS support?

Thanks!

  • Ron

Hi Ron,

I talked with the person who does our OpenMPI builds. He said that he had explicitly disabled PBS support since we did not want to introduce libtm as an external dependency in our Open MPI builds. Though he opened an RFE (TPR#27355) and will see what can be done for a future release.

In the mean time, can you work with the Pleiades system admins to build OpenMPI for their system?

-Mat

Hi,

Thanks for the update! I didn’t realize PBS needed a runtime library.

The folks at Pleiades like to use their vendor MPT MPI library so they are working with me to try to get that to work with PGI (although I am more used to using the included openMPI libraries and would prefer those).

  • Ron