Internal file I/O error in shared libraries

A user on the matlab newsgroup comp.soft-sys.matlab reported internal file errors on multiple runs of a shared library. I have isolated the code and reproduced the problem outside of matlab. The example code is below.


[chulbert@fourier test]$ uname -a
Linux fourier.isl-inc.com 2.6.15-prep #1 SMP Tue Mar 21 11:20:33 EST 2006
x86_64 x86_64 x86_64 GNU/Linux
[chulbert@fourier test]$ pgf90 -V

pgf90 7.0-1 64-bit target on x86-64 Linux
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2006, STMicroelectronics, Inc. All Rights Reserved.
[chulbert@fourier test]$ pgcc -V

pgcc 7.0-1 64-bit target on x86-64 Linux
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2006, STMicroelectronics, Inc. All Rights Reserved.
[chulbert@fourier test]$ cat test.f90
SUBROUTINE mexfunction
character(len=80) :: debugout

write(debugout,‘A17’)‘Reading RRTM_DATA’
write(,) debugout
END SUBROUTINE
[chulbert@fourier test]$ pgf90 -shared -otest1.mexa64 test.f90
[chulbert@fourier test]$ pgf90 -shared -otest2.mexa64 test.f90
[chulbert@fourier test]$ cat test.c
#include <stdlib.h>
#include <dlfcn.h>

int
main(void)
{
void *mex_functions[2] = {NULL,NULL};
void (*mex_entry1)(void);
void (*mex_entry2)(void);
int zero = 0;

mex_functions[0] = dlopen(“test1.mexa64”,RTLD_NOW);
mex_entry1 = dlsym(mex_functions[0],“mexfunction_”);
mex_entry1();
mex_functions[1] = dlopen(“test2.mexa64”,RTLD_NOW);
mex_entry2 = dlsym(mex_functions[1],“mexfunction_”);
mex_entry2();

dlclose(mex_functions[0]);
dlclose(mex_functions[1]);
return 0;
}
[chulbert@fourier test]$ pgcc -g test.c -ldl
test.c:
[chulbert@fourier test]$ ./a.out
Reading RRTM_DATA
PGFIO-F-254/formatted write/internal file/illegal repeat count in format.
In source file test.f90, at line number 4

Hi Chris,

I’m still looking into this and need to talk with a few of our engineers before I can get at exactly what the issue is here. I have my suspicions but need to confirm.

A work around is to add “-Bstatic_pgi” to each of the test.f90 links. This will bring in static copies of the PGF90 runtime libraries into each shared library.

Example:

% pgf90 -shared -otest1.mexa64 test.f90 -V7.0-1 -fpic -Bstatic_pgi
% pgf90 -shared -otest2.mexa64 test.f90 -V7.0-1 -fpic -Bstatic_pgi
% pgcc -g test.c -ldl -V7.0-1
test.c:
% a.out
 Reading RRTM_DATA
 Reading RRTM_DATA
  • Mat

Is there a work around for v6.1-6? As demonstrated below, the -Bstatic_pgi option does not exist in that version and the -Bstatic option does not work in this case.

pgf90 6.1-6 64-bit target on x86-64 Linux

%pgf90 -shared -otest_mex.mexa64 test.f90 -fpic -Bstatic_pgi
pgf90-Warning-Unknown switch: -Bstatic_pgi

%pgf90 -shared -otest_mex.mexa64 test.f90 -fpic -Bstatic
test.f90:
/usr/bin/ld: /usr/lib64/libc.a(ctype.o): relocation R_X86_64_32 against `__pthread_internal_tsd_address’ can not be used when making a shared object; recompile with -fPIC
/usr/lib64/libc.a: could not read symbols: Bad value

Hi Chris, B. Reen,

I’ve been able to work on this a bit more and found a better solution. Add “RTLD_GLOBAL” to the dlopen options. This way the PGF90 runtime information can be shared between the two modules.

% cat test.f90
SUBROUTINE mexfunction
character(len=80) :: debugout
write(debugout,'A17')'Reading RRTM_DATA'
write(*,*) debugout
END SUBROUTINE

% cat test.c
#include <stdlib.h>
#include <dlfcn.h>

int
main(void)
{
void *mex_functions[2] = {NULL,NULL};
void (*mex_entry1)(void);
void (*mex_entry2)(void);
int zero = 0;

mex_functions[0] = dlopen("test1.mexa64",RTLD_NOW | RTLD_GLOBAL );
mex_entry1 = dlsym(mex_functions[0],"mexfunction_");
mex_entry1();
mex_functions[1] = dlopen("test2.mexa64",RTLD_NOW | RTLD_GLOBAL );
mex_entry2 = dlsym(mex_functions[1],"mexfunction_");
mex_entry2();

dlclose(mex_functions[0]);
dlclose(mex_functions[1]);
return 0;
}

% pgf90 -shared -otest1.mexa64 test.f90 -V6.1-5 -fpic
% pgf90 -shared -otest2.mexa64 test.f90 -V6.1-5 -fpic
% pgcc -g test.c -ldl -V6.1-5
test.c:
% a.out
 Reading RRTM_DATA
 Reading RRTM_DATA

I tested this with 6.1, 6.2 and the up coming 7.0 compilers, both in 32 and 64-bits.

“-Bstatic_pgi” was added with 6.2. Prior to this, you would need to use the “-v” switch to have the driver print the ld command, edit this command to put “-Bstatic” before and “-Bdynamic” after the PGI libraries. A cumbersome process and hence the addition of the flag.

  • Mat

The RTLD_GLOBAL option will not work in his case because what he really has are those Matlab MEX functions. Matlab is the when loading the shared library, so the user or developer cannot change the way shared libraries are opened. Perhaps the best thing is to upgrade to the 6.2 compilers if he doesn’t already have a subscription. Otherwise I suppose he would want to edit the link line himself.

Chris

Thanks. As pointed out, the RTLD_GLOBAL option does not work in my case. By editing the link line as suggested I’ve gotten it to work for my Matlab application.

Hi B. Reen,

I’ll be looking at how we build our shared object to see if we can do anything to have RTLD_GLOBAL the default. For now, however, using the static PGI runtime is the only way I know to get this to work properly.

Thanks,
Mat