I am trying to submit a mm5.mpp job to a Rocks Frontend Node Linux Cluster --“qsub mypbsfile”
The error shows:
error while loading shared libraries: libpgf90.so: cannot open shared object file: No such file or directory
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
In fact, I already set environment in ~/.cshrc or ~/.bashrc
Are the nodes on the cluster able to access the PGI libso directory? and if so, has ‘libpgf90.so’ been installed? Also, you can try compiling your code statically, “-Bstatic” , thus removing the need for the shared library.
As I understand, -Bstatic is used for small memory size program. My mm5.mpp has 3 domains with high resolution and even input objects are larger than 2G. So, I use -mcmodel=medium and -Mlarge_arrays together for both compiler (-FFLAGS) and link (-LDFLAGS) flags
Therefore, I cannot statically link (-Bstatic) objects using the medium memory model, right?
Are the nodes on the cluster able to access the PGI libso directory?
At least on “Master node”, libpgf90.so has been installed and can access
libso directory. I am not sure if other nodes have. Let me check it first.
One more question, what the maximum size (?GB) of the output file object can be created if -mcmodel=medium and -Mlarge_array are used for making executable objects?
The error still exist. I don’t know why? Anyone have ever been met this kind of problem? Is there bugs in pgi/linux86-64/6.0/?
==> tmp.pbs.e41771 <==
$MM5RUNPATH//mm5.mpp: error while loading shared
libraries: libpgmp.so: cannot open shared object file: No such file or
directory
==> tmp.pbs.o41771 <==
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
==> tmp.pbs.pe41771 <==
rm: cannot remove `/tmp/41771.1.compute-0-0.q/rsh’: No such file or directory
As I understand, -Bstatic is used for small memory size program. My mm5.mpp has 3 domains with high resolution and even input objects are larger than 2G. So, I use -mcmodel=medium and -Mlarge_arrays together for both compiler (-FFLAGS) and link (-LDFLAGS) flags
This is correct for the 6.0 release. With 6.1, we were able to create static libraries (those in libso) which can be linked with “-mcmodel=medium”.
One more question, what the maximum size (?GB) of the output file object can be created if -mcmodel=medium and -Mlarge_array are used for making executable objects?
Off hand I don’t know, but I’m guessing it’s rather large. I’ll try to find an answer for you, though.
The error still exist. I don’t know why? Anyone have ever been met this kind of problem? Is there bugs in pgi/linux86-64/6.0/?
It’s not a bug. Your application simply can’t find one of it’s dynamic libraries. Since the actual error changed, my guess is that by setting the LD_LIBRARY_PATH, your application was able to find libpgf90.so, but now can’t find a different library. From one of the nodes, use the ‘ldd’ utility to determine the application’s dependencies, i.e. ‘ldd ’. Next, find where the missing libraries are located and add this path to your LD_LIBRARY_PATH.
Seems like the myrinet nodes dont see the library path on the main node. That’s why running on a single storage node is normal and running as a queue job has that problem.
Is it possible that this is related with license issues? Or any other ideas?