Mpirun 3.1.5 bundled with HPC SDK 20.11 does not run between nodes

lahan · December 25, 2020, 7:48pm

With the hostfile

geof70 slots=1
geof30 slots=1

I can run the system default Open MPI,

/usr/bin/mpirun -np 2 -hostfile hostfile hostname

with correct output:

geof70
geof30

I can also run the MPI bundled with PGI compilers successfully:

/opt/pgi/linux86-64/19.10/mpi/openmpi-3.1.3/bin/mpirun -np 2 -hostfile hostfile hostname

Locally, I can run the MPI bundled with the Nvidia HPC SDK:

/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/mpi/bin/mpirun -np 2 hostname

However, I have no success between machines:

/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/mpi/bin/mpirun -np 2 -hostfile hostfile hostname

Output:

/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/orted: error while loading shared libraries: libnvcpumath.so: cannot open shared object file: No such file or directory

The libnvcpumath.so is installed on both nodes:

ll /opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/compilers/lib/libnvcpumath.so
-rwxr-xr-x 1 root root 2420888 Dec 4 01:35 /opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/compilers/lib/libnvcpumath.so*

With the environment set as in /opt/nvidia-20.11/hpc_sdk/modulefiles/nvhpc/20.11, this does not help:

/opt/nvidia-20.11/hpc_sdk/Linux_x86_64/20.11/comm_libs/mpi/bin/mpirun -x LD_LIBRARY_PATH -x PATH -x OPAL_PREFIX -np 2 -hostfile hostfile hostname

To conclude, I can compile my sources with mpif90 of HPC SDK 20.11, but I can run my executables between nodes only with mpirun of PGI 19.10. Is there a way to have mpirun of HPC SDK 20.11 running between nodes?

lahan · March 23, 2021, 4:12pm

SOLVED:

create a file: vi /etc/ld.so.conf.d/nvidia.conf
insert a line: /opt/nvidia/hpc_sdk/Linux_x86_64/20.11/REDIST/compilers/lib
save and run ldconfig

(all as root).
Sending the directory via mpirun -x LD_LIBRARY_PATH did not help.
Similar steps needed in 21.2 as well.

Topic		Replies	Views
Bug in MPI installation in HPC SDK 20.11? nvc, nvc++ and nvfortran	4	1608	January 28, 2021
The problem of installing and using the NVhpc SDK nvc, nvc++ and nvfortran	3	645	January 23, 2024
MPI install and behaviour from nvidia-hpc-sdk nvc, nvc++ and nvfortran nvbugs	2	2286	December 10, 2020
Issue of Running OpenMPI on Multiple GPU Nodes with InfiniBand nvc, nvc++ and nvfortran openmpi	12	2806	March 11, 2024
How to run HPL script over Ethernet nvc, nvc++ and nvfortran hpc	5	752	June 25, 2024
MPI Fortran on Archlinux nvc, nvc++ and nvfortran	5	806	December 20, 2022
HPC SDK 22.11 is now available Legacy PGI Compilers	4	749	November 22, 2022
Installing HCP SDK with psm2 nvc, nvc++ and nvfortran	10	639	January 18, 2023
HPC SDK 21.5 is now available nvc, nvc++ and nvfortran	5	769	July 17, 2021
NVIDIA HPC SDK Version 22.1 MPI question CUDA Setup and Installation	4	877	February 25, 2022

Mpirun 3.1.5 bundled with HPC SDK 20.11 does not run between nodes

Related topics