I’m trying to run a trivial MPI
hello.c
code over IB (mvapich) using PGI CDK 12.4.
When I submit a test job I get:
/pool/cluster7/hpc/pgi/linux86-64/2012/mpi/mvapich/bin/mpirun_rsh: error while loading shared libraries: libpgmp.so: cannot open shared object file: No such file or directory
note that
% ldd /pool/cluster7/hpc/pgi/linux86-64/2012/mpi/mvapich/bin/mpirun_rsh
linux-vdso.so.1 => (0x00007fff7fe82000)
libm.so.6 => /lib64/libm.so.6 (0x0000003e4da00000)
libpgmp.so => /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgmp.so (0x00002b17a05ec000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e4e200000)
libpgc.so => /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgc.so (0x00002b17a075b000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e4d600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e4d200000)
and
ls -l /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgmp.so
-rwxr-xr-x 2 hpc hpc 237023 Apr 13 14:59 /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgmp.so
If I add
setenv LD_LIBRARY_PATH /pool/cluster7/hpc/pgi/linux86-64/12.4/libso:$LD_LIBRARY_PATH
the err message is now
/pool/cluster7/hpc/pgi/linux86-64/2012/mpi/mvapich/bin/mpispawn: error while loading shared libraries: libpgmp.so: cannot open shared object file: No such file or directory
(the compute node ‘sees’ fine the
/pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgc.so
file.)
% ldd /pool/cluster7/hpc/pgi/linux86-64/2012/mpi/mvapich/bin/mpispawn
linux-vdso.so.1 => (0x00007fff871fc000)
libpgmp.so => /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgmp.so (0x00002b38c7fc8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e4e200000)
libpgc.so => /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgc.so (0x00002b38c8150000)
libm.so.6 => /lib64/libm.so.6 (0x0000003e4da00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e4d600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e4d200000)
What am I missing? Any hint?
Thx, S.
Hi Sylvain,
Is the “/pool/cluster7/hpc/pgi/linux86-64/12.4/libso/” accessible from the remote node? Is the LD_LIBRARY_PATH being set correctly on the remote node when mpiexec invoked?
Note, you can try compiling with -Bstatic_pgi to force static linking of the PGI run time libraries.
Hi Mat,
The submitted script, hello.csh, has both
setenv LD_LIBRARY_PATH /pool/cluster7/hpc/pgi/linux86-6/12.4/libso:$LD_LIBRARY_PATH
ls -l /pool/cluster7/hpc/pgi/linux86-64/12.4/libso/libpgc.so
so /pool/cluster7 is cross mounted on the compute node(s) and the setenv changes the err message from
[...]/mpirun_rsh: error while loading shared libraries: [...]
w/out the setenv to
[...]/mpispawn: error while loading shared libraries: [...]
w/ the setenv
Also, -Bstatic_pgi doesn’t fix this, the libpgmp.so is needed by mpirun_rsh and mpispawn, not by hello:
% ldd hello
linux-vdso.so.1 => (0x00007fffcdb89000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003b60600000)
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x0000003b60a00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e4e200000)
librt.so.1 => /lib64/librt.so.1 (0x0000003e4ea00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003e4da00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e4d600000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003e4de00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e4d200000)
If mpirun_rsh runs (thanks to setenv LD_LIBRARY_PATH) why mpispawn doesn’t sees it???
Have I missed something when installing the CDK (12.4)?
Thx, S.
so, the fix, is to put the
setenv LD_LIBRARY_PATH /pool/cluster7/hpc/pgi/linux86-64/12.4/libso:$LD_LIBRARY_PATH
in the ~/.cshrc, so it is propagated to the other ‘ssh’ on the other nodes (or spawn shells)…
The solution is thus much simpler… S.
in the ~/.cshrc, so it is propagated to the other ‘ssh’ on the other nodes (or spawn shells)…
This would have been my next suggestion. In general, I don’t personally like doing this but will from time to time.
the libpgmp.so is needed by mpirun_rsh and mpispawn,
Sorry, my misunderstanding. What you may want to do then is go back and rebuild MVAPICH with the “-Bstatic_pgi” flag to link in our static libraries. My guess it was compiled with “-Bdynamic”.
Mat,
I installed the PGI CDK as provided, that I assume should be ‘optimal’… I don’t want to rebuild the distirb when the tool kit provide one.
S.
Hi S.
No, the MPICH installs that come with the CDK are debug versions for use with the PGI debugger and profiler. They are also configured for use with Ethernet so wont be able to take advantage of any optimized hardware you may have. For performance, you need to use the MPI implementation recommended by your hardware vendor.