Hye all,
I’m doing a cross posting between f2py, python and fortran developpers about my problem.
I’m running some fortran subroutine from a python script (the fortran is wrapped with f2py, but compiled with pgi6.0). My environment is a amd64_x86 linux 2.4 cluster (opteron) with pgi6.0 and libc.so.6
sometime, write(,) in my fortran code will produce the following run-time error :
PGFIO-F-202/list-directed write/unit=6/conflicting specifiers.
File name = stdout formatted, sequential access record = 1
In source file module2/module2.f, at line number 2
This works very fine when building a fortran binary
The problem appears when using pgf90 but not with pgf77.
I’ve used f2py. I will try it myself and let you know what I find out.
You say sometimes the error occurs, but othertimes not? What types of data are you writing when the error occurs?
running build
running config_fc
running build_src
building extension “hello” sources
f2py:> /tmp/tmpspkPQh/src/hellomodule.c
creating /tmp/tmpspkPQh
creating /tmp/tmpspkPQh/src
Reading fortran codes…
Reading file ‘hello.f’
Post-processing…
Block: hello
Block: shello
Post-processing (stage 2)…
Building modules…
Building module “hello”…
Constructing wrapper function “shello”…
shello(a)
Wrote C/API module “hello” to file “/tmp/tmpspkPQh/src/hellomodule.c”
adding ‘/tmp/tmpspkPQh/src/fortranobject.c’ to sources.
adding ‘/tmp/tmpspkPQh/src’ to include_dirs.
copying /usr/lib64/python2.3/site-packages/f2py2e/src/fortranobject.c → /tmp/tm
pspkPQh/src
copying /usr/lib64/python2.3/site-packages/f2py2e/src/fortranobject.h → /tmp/tm
pspkPQh/src
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize PGroupFCompiler
customize PGroupFCompiler using build_ext
building ‘hello’ extension
compiling C sources
gcc options: ‘-pthread -fno-strict-aliasing -DNDEBUG -O2 -fmessage-length=0 -Wal
l -fPIC’
creating /tmp/tmpspkPQh/tmp
creating /tmp/tmpspkPQh/tmp/tmpspkPQh
creating /tmp/tmpspkPQh/tmp/tmpspkPQh/src
compile options: ‘-I/usr/include/python2.3 -I/tmp/tmpspkPQh/src -I/usr/include/p
ython2.3 -c’
gcc: /tmp/tmpspkPQh/src/fortranobject.c
/tmp/tmpspkPQh/src/fortranobject.c: In function `fortran_doc’:
/tmp/tmpspkPQh/src/fortranobject.c:123: warning: int format, different type arg(
arg 3)
gcc: /tmp/tmpspkPQh/src/hellomodule.c
compiling Fortran sources
pgf77(f77) options: ‘-fpic -Minform=inform -Mnosecond_underscore -fast’
pgf90(f90) options: ‘-fpic -Minform=inform -Mnosecond_underscore -fast’
pgf90(fix) options: ‘-Mfixed -fpic -Minform=inform -Mnosecond_underscore -fast’
compile options: ‘-I/usr/include/python2.3 -I/tmp/tmpspkPQh/src -I/usr/include/p
ython2.3 -c’
pgf77:f77: hello.f
pgf90 -shared -fpic /tmp/tmpspkPQh/tmp/tmpspkPQh/src/hellomodule.o /tmp/tmpspkPQ
h/tmp/tmpspkPQh/src/fortranobject.o /tmp/tmpspkPQh/hello.o -o ./hello.so
Removing build directory /tmp/tmpspkPQh
brentl@leback:~/simple> python
Python 2.3.4 (#1, Feb 7 2005, 15:05:26)
[GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
import hello
dir(hello)
[‘doc’, ‘file’, ‘name’, ‘version’, ‘as_column_major_storage’, ‘h
as_column_major_storage’, ‘shello’]
hello.shello(3)
Hello from Fortran!
a = 3
Hi * *
quit
‘Use Ctrl-D (i.e. EOF) to exit.’
If I just generate two .so files the normal way from the fortran subroutines which become python modules, and I link them into a fortran main program,
I don’t get an error. I guess the next step is to try to link these two .so files into a gcc main program. Do you see anywhere in the python linkage code where it is trying to redefine stdout?
Yes, so I can generate two .so files using our pgf90 compiler. I can then
link them into a main program compiled and linked with gcc. I do not get
any error. Therefore, I suspect the problem has to be something do with the python runtime. I didn’t see anything in the python linkage code that mucks with stdout. Could there be something in the 2nd “import” statement that is causing this?
The ‘normal’ compile/link/exec mode with pgf90 works fine also for me.
The problems only occurs when .so are imported into the python interpreter.
Do you see anywhere in the python linkage code where it is trying to redefine stdout?
The bug also appears with other IO statement, for example let’s put this in the second module :
open(1,file=‘test.txt’)
write(1,*) ‘Hello’
close(1)
and the message is in this case :
PGFIO-F-201/OPEN/unit=1/illegal value for specifier.
File name = test.txt
In source file module2.f90, at line number 2
And this works if I only import the module2
There is something changed at the second import …
I’ve also tried to debug the python interpreter, bur I don’t know what to monitor …
Please brent, keep contact because I feel lost …
and thank’s for your help
I was able to simplify the problem down to this code segment:
int main()
{
int i;
dl_funcptr p;
void *handle;
i = 4;
handle = dlopen(“./shello.so”, RTLD_NOW);
if (handle == NULL) {
printf(“shello.so not there %s\n”,dlerror());
return -1;
}
p = (dl_funcptr) dlsym(handle,“shello_”);
(*p)();
handle = dlopen(“./thello.so”, RTLD_NOW);
if (handle == NULL) {
printf(“thello.so not there %s\n”,dlerror());
return -1;
}
p = (dl_funcptr) dlsym(handle,“thello_”);
(*p)();
}
When python imports the f2py created .so files, it uses the dlopen call.
The flag argument it uses is 2, for RTLD_NOW. What happens is that the 2nd call to dlopen cannot access the global symbols pulled in from the first dlopen call. Found this on the web:
flag must be either RTLD_LAZY, meaning resolve undefined symbols as code from the dynamic library is executed, or RTLD_NOW, meaning resolve all undefined symbols before dlopen returns, and fail if this cannot be done. Optionally, RTLD_GLOBAL may be or’ed with flag, in which case the external symbols defined in the library will be made available to subsequently loaded libraries.
When I change the above calls to dlopen to or in RTLD_GLOBAL, everything works fine. The place where this needs to change in the Python source is dynload_shlib.c, line 129 (In python 2.3.4).
We are still investigating why it happens. Or, why it works with other compilers. Given the explanation of RTLD_GLOBAL, I’m not sure why that is not always needed. So, we need to dig a little more. But, at least you have a work-around for now.
If you turn on debugging, you will see that when you do the first python import, all of our pgi runtime .so files get loaded. When you do the second python import, dlopen knows that the pgi runtime does not need to be reloaded. But how does the second f90 subroutine access the pgi runtime routines? It obviously does, as we get a pgi f90 runtime error. But, something is not right, either uninitialized data or data is not shared properly.
So, we will continue to investigate, and pass it onto our runtime engineers.
If your f90 compilers share i/o runtime with the compiler used to compile python, there will most likely never be problems of this sort. So, that might be why at least some of the solaris, hp-ux, irix etc. are okay.
Recently we have switched to using the f2py that comes with numpy and encountered lots of problems with the sys.setdlopenflags(258) work around.
As soon as we call “sys.setdlopenflags(258)” we are no longer able to import any *.so files built with the numpy included f2py. If we try we get a “Segmentation fault” and python terminates.
Here is the error in full: (fsim.so is f2py wrapped pgf90 compiled code)
[bflynt@master hiarms_pyrcas]# python
Python 2.4.2 (#1, May 2 2006, 08:28:01)
[GCC 4.1.0 (SUSE Linux)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
If I leave out the sys.setdlopenflags(258) the module imports fine until the code tries to write or print something. Which was the reason we all used the work around.
Has PG tried to fix this issue with the compiler? It has been almost 2 years now and still this problem exists.
It will take me a little while to recreate this testing environment. In the meantime, could you try using -Bstatic_pgi on the pgi link line when you create the .so file? This might be another temporary work-around.
The real answer is that we are working to fix this in our 7.1 compiler which should be out later this year.