HPC-X does not allow running apps without `mpirun`

(I could not find a forum category for HPC-X, but since it’s part of the HPC SDK, I’ve posted this here.)

With upstream OpenMPI, it’s possible to run MPI apps in singleton mode with, e.g. ./my_executable, without using the mpirun launcher.

With HPC-X, this fails immediately with an error. However, I can still run with mpirun -np 1 ./my_executable.

Is it intended for HPC-X to support singleton execution without the mpirun launcher?

Thanks,
Ben

HPC-X relies on certain variables being set in the environment. The trampolines and wrappers will normally set these for you, but if you are trying to run a singleton outside of the MPI wrappers, then those variables may not be getting set:

cparrott@ice4 ~ $ module load nvhpc/24.9
cparrott@ice4 ~ $ mpicc -o hello_mpi hello_mpi.c
"hello_mpi.c", line 52: warning: variable "ierr" was set but never used [set_but_not_used]
    int ierr;
        ^

Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

cparrott@ice4 ~ $ OMPI_MCA_ess_singleton_isolated=1 ./hello_mpi
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      ice4
  Framework: pml
--------------------------------------------------------------------------
[ice4:598736] PML ob1 cannot be selected

However, if you first source hpcx-mt-init.sh and then invoke hpcx_load, it should work, e.g.:

cparrott@ice4 ~ $ source /proj/nv/Linux_x86_64/24.9/comm_libs/12.6/hpcx/hpcx-2.20/hpcx-mt-init.sh 
cparrott@ice4 ~ $ hpcx_load
cparrott@ice4 ~ $ OMPI_MCA_ess_singleton_isolated=1 ./hello_mpi

HELLO_MPI - Master process:
  C/MPI version
  An MPI example program.

  The number of processes is 1.

  Process 0 says 'Hello, world!'
  Elapsed wall clock time = 0.000013 seconds.

HELLO_MPI - Master process:
  Normal end of execution: 'Goodbye, world!'

I’m not really sure there is a good way to fix this situation in the HPC SDK, though. We probably should mention this scenario somewhere in the release notes, at least.

Just FYI- I ran into this when using the Docker container images for HPC SDK 24.9. I had expected that the environment variables would be automatically set there, but I can just set them in a Dockerfile using the HPC SDK base image. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.