With the hostfile
geof70 slots=1
geof30 slots=1
I can run the system default Open MPI,
/usr/bin/mpirun -np 2 -hostfile hostfile hostname
with correct output:
geof70
geof30
I can also run the MPI bundled with PGI compilers successfully:
/opt/pgi/linux86-64/19.10/mpi/openmpi-3.1.3/bin/mpirun -np 2 -hostfile hostfile hostname
Locally, I can run the MPI bundled with the Nvidia HPC SDK:
/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/mpi/bin/mpirun -np 2 hostname
However, I have no success between machines:
/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/mpi/bin/mpirun -np 2 -hostfile hostfile hostname
Output:
/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/orted: error while loading shared libraries: libnvcpumath.so: cannot open shared object file: No such file or directory
The libnvcpumath.so is installed on both nodes:
ll /opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/compilers/lib/libnvcpumath.so
-rwxr-xr-x 1 root root 2420888 Dec 4 01:35 /opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/compilers/lib/libnvcpumath.so*
With the environment set as in /opt/nvidia-20.11/hpc_sdk/modulefiles/nvhpc/20.11, this does not help:
/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/mpi/bin/mpirun -x LD_LIBRARY_PATH -x PATH -x OPAL_PREFIX -np 2 -hostfile hostfile hostname
To conclude, I can compile my sources with mpif90 of HPC SDK 20.11, but I can run my executables between nodes only with mpirun of PGI 19.10. Is there a way to have mpirun of HPC SDK 20.11 running between nodes?