Hi,
I am having some trouble running multi-GPU OpenACC codes on the new V100 nodes on NASA’s Pleiades system.
If I use the OpenMPI included with the PGI compiler, I can only run jobs that use multiple GPUs on a single node. If I try to run on multiple nodes, the openmpi does not “see” the correct topology even if I directly send it a hostfile. I found this: https://www.open-mpi.org/faq/?category=tm
and when I test the command out, the “tm” is not listed (which is needed for PBS).
The openmpi included does include SLURM support (which all other HPC systems I use utilize).
Would it be possible for future PGI releases to configure openmpi to include PBS (“tm”) by default?
Or alternatively, is there a flag or config file edit I can do when installing PGI with the included openmpi to enable PBS support?
Thanks!
- Ron