mpirun mm5.mpp has error: libpgf90.so: cannot open .....

I am trying to submit a mm5.mpp job to a Rocks Frontend Node Linux Cluster --“qsub mypbsfile”

The error shows:

error while loading shared libraries: libpgf90.so: cannot open shared object file: No such file or directory
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

In fact, I already set environment in ~/.cshrc or ~/.bashrc

setenv LD_LIBRARY_PATH /opt/pgi/linux86-64/6.0/libso

or

export LD_LIBRARY_PATH=/opt/pgi/linux86-64/6.0/libso


How can I solve this problem? Thanks.

Z. He

Hi Z.He,

Are the nodes on the cluster able to access the PGI libso directory? and if so, has ‘libpgf90.so’ been installed? Also, you can try compiling your code statically, “-Bstatic” , thus removing the need for the shared library.

  • Mat

As I understand, -Bstatic is used for small memory size program. My mm5.mpp has 3 domains with high resolution and even input objects are larger than 2G. So, I use -mcmodel=medium and -Mlarge_arrays together for both compiler (-FFLAGS) and link (-LDFLAGS) flags

Therefore, I cannot statically link (-Bstatic) objects using the medium memory model, right?

Are the nodes on the cluster able to access the PGI libso directory?

At least on “Master node”, libpgf90.so has been installed and can access
libso directory. I am not sure if other nodes have. Let me check it first.

One more question, what the maximum size (?GB) of the output file object can be created if -mcmodel=medium and -Mlarge_array are used for making executable objects?

Thanks.

Z. He

In fact, all nodes can access LD_LIBRARY_PATH=/opt/pgi/linux86-64/6.0/libso and all shared libraries (*.so) are available there.

(csh) setenv LD_LIBRARY_PATH /opt/pgi/linux86-64/6.0/libso
(bash) export LD_LIBRARY_PATH=/opt/pgi/linux86-64/6.0/libso

The error still exist. I don’t know why? Anyone have ever been met this kind of problem? Is there bugs in pgi/linux86-64/6.0/?


==> tmp.pbs.e41771 <==
$MM5RUNPATH//mm5.mpp: error while loading shared
libraries: libpgmp.so: cannot open shared object file: No such file or
directory

==> tmp.pbs.o41771 <==

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.


==> tmp.pbs.pe41771 <==
rm: cannot remove `/tmp/41771.1.compute-0-0.q/rsh’: No such file or directory

==> tmp.pbs.po41771 <==
/opt/gridengine/default/spool/compute-0-0/active_jobs/41771.1/pe_hostfile

compute-0-0
compute-0-0
compute-0-1
compute-0-1
compute-0-10
compute-0-10
compute-0-11
compute-0-11

Thanks!

Z. He

(END)

Hi Z.He,

As I understand, -Bstatic is used for small memory size program. My mm5.mpp has 3 domains with high resolution and even input objects are larger than 2G. So, I use -mcmodel=medium and -Mlarge_arrays together for both compiler (-FFLAGS) and link (-LDFLAGS) flags

This is correct for the 6.0 release. With 6.1, we were able to create static libraries (those in libso) which can be linked with “-mcmodel=medium”.

One more question, what the maximum size (?GB) of the output file object can be created if -mcmodel=medium and -Mlarge_array are used for making executable objects?

Off hand I don’t know, but I’m guessing it’s rather large. I’ll try to find an answer for you, though.

The error still exist. I don’t know why? Anyone have ever been met this kind of problem? Is there bugs in pgi/linux86-64/6.0/?

It’s not a bug. Your application simply can’t find one of it’s dynamic libraries. Since the actual error changed, my guess is that by setting the LD_LIBRARY_PATH, your application was able to find libpgf90.so, but now can’t find a different library. From one of the nodes, use the ‘ldd’ utility to determine the application’s dependencies, i.e. 'ldd '. Next, find where the missing libraries are located and add this path to your LD_LIBRARY_PATH.

  • Mat

I have exactly the same problem. And including the path to the corresponding library does not fix it.

The problem exist only if I submit a job through QSUB. If simply run the application - runs OK.

Any suggestions?

Thanks.

Hi,

What does it show when you ldd a binary? Does it point to the correct *.so files or not found?

Hongyon

Hi,

ldd gives that everything is linked and OK.

Seems like the myrinet nodes dont see the library path on the main node. That’s why running on a single storage node is normal and running as a queue job has that problem.

Is it possible that this is related with license issues? Or any other ideas?

The machine is actually Zaphod

Hi Alexander,

I don’t think it is a license issue. Try one of these:

  1. add LD_LIBRARY_PATH in a file that you submit, for csh.

setenv LD_LIBRARY_PATH /opt/pgi/linux86-64/7.1-2/libso:$LD_LIBRARY_PATH

  1. Also try adding this too.
    #PBS -v LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/pgi/linux86-64/7.1-2/libso"

Also, you might want to pass -machinefile in the mpirun command too. Let me know if it works.

Hongyon