error while loading shared libraries

Hello,

I’m attempting to compile some existing C code with the PGI CDK C compiler.

The code includes a number of shared libraries, and executables which link to those shared libraries.

This being my first attempt, I’ve not included any optimizations in the compilation step. I’ve simply replace gcc with the pgcc command. The compilation appears to complete successfully. However when I attempt to run the main program it crashes abruptly on startup with: FATAL PROGRAM ERROR: Aborted on SIGSEGV.

So, I looked up the command for the PGI debugger and attempted to run the main program from there. It fails to launch and complains it cannot locate one of my shared libraries. I found the library in the directory I expect it to be in. That directory is in LD_LIBRARY_PATH and I include it in the path set by the -I flag for pgdbg:

pgdbg -I $FPA/lib/$PLATFORM/:"other stuff here" $FPA/bin/$PLATFORM/$PROG

I don’t know if the SIGSEGV is related to the failure of the debugger to locate the shared library or not.

Any guidance would be greatly appreciated.

I’m running PGI CDK 7.2-5 x86 32 bit version on an HP with two quad-core 64 bit Intel processors. Operating system is Red Hat Enterprise Linux Workstation 5.

I don’t have a license for 64 bit in case you’re wondering why I’m using the 32 bit version.

Hi Emma,

PGDBG should be able to find your shared library if it’s in your environment’s LD_LIBRARY_PATH. However, the “-I” option tells the debugger where your source files are located, not your shared libraries. What happens if you remove the "-I$FPA/lib/$PLATFORM/:“other stuff here” option?

  • Mat

Hmm, that’s what I thought. Anyway I removed the -I option completely and I get the same result.

Interestingly if I compile using gcc, and then run PGDBG it works just fine. Not sure why I didn’t try this before. So this makes me think it’s a compile issue not a debugger issue. I guess I’ll have to take a closer look at our Makefiles. Are you aware of any differences between gcc and pgcc when it comes to compiling shared libraries or linking to them?

Hi Emma,

Chapter 7 of the PGI User’s guide (http://www.pgroup.com/doc/pgiug.pdf) discusses how to create shared objects on Linux. However, we use the same method as gcc where you need compile your objects with “-fpic” and then use the flag “-shared” to create the shared object.

Also, we’re object compatible with gcc so you can try creating the shared object with gcc and then link it with your application built with pgcc. Make sure that you compile the gcc portion with “-m32” if you’re on a 64-bit system, in order to create 32-bit objects.

  • Mat

Hi Mat,

Well I tried your suggest and compiled the library with GCC and the main app with PGCC, same result. So for fun I compiled the library with PGCC and the main app with GCC, at least this time I got a different error message.

error while loading shared libraries: /opt/pgi/linux86/7.2-5/lib/libpgc.so: cannot restore segment prot after reloc: Permission denied

Our makefiles are ancient and convoluted, but as far as I can tell I’m using the -fPIC and -shared flags correctly.

Thank-you for your ideas

You must have SELinux enabled. This secure linux module can cause a lot of headaches and if you don’t need the enhanced security features, you might be better off disabling it. To disable SELinux altogether, add “SELINUX=disabled” to your system’s “/etc/sysconfig/selinux” file and reboot.

If you do need SELinux, then for all shared libraries you create as well as the PGI shared libraries, you need to, as root, run the command “chcon -t texrel_shlib_t nameoflib.so”.

Hope this helps,
Mat

Hi Matt,

Are there default optimizations that take place in the absence of any optimization flags? How can I turn them all off? I’m using -g -O0 right now, but still no change the program still crashes on start up and I get the same message if I attempt to run in the debugger.

Hi Emma,

“-O0” disables all optimizations. Does the problem still occur after you turn off SELinux?

  • Mat

Hi Mat,

Okay, so I disabled SE Linux, rebooted then compiled the libraries with pgcc and the main app with gcc. running the program crashes as usual, now running pgdb fails the same way as running with out it. I suppose this is an improvement of sorts.

The call immediately prior to the crash appears to be an fseek.

I tried to place a break point, but when I hit run it seems to disappear.

Well it would appear this thread has finally become about debugging.

I can’t for the life of me get a break point to stick. Also the only way I can see the library is if I compile the library in wit pgcc and the main app with gcc. This seems wrong to me.

Hi Emma,

Now that you’ve disabled SELinux, let’s start back at the beginning.

While PGDBG has gotten a lot better with debugging shared libraries, it shill isn’t perfect. It set a break point in a shared library, the library must first be loaded. So first set a breakpoint at the beginning of the program, select ‘run’, and then you should be able now set a breakpoint in the shared library provided it’s compiled with “-g” (or “-gopt”) and the debugger is able to find the library’s source files.

Personally when I have access to all the source, I compile and link the library’s source directly into my main program. Give it a try if you can.

Finally, try running your program using Valgrind (www.valgrind.org) to check for uninitialized memory references (UMR). UMRs are nasty bugs that can cause seemingly random problems and may explain why compiling the main program with gcc “passes”.

Hope this helps,
Mat

All the UMRs I found originated in code I don’t have access to.

The error was occurring around a construct of the form:

if ( !A | B !=0) {
    return C
}

I broke this up into

if ( !A )
{
    return C
}
else if ( B != 0 )
{
    return C
}

That moved me on a little further into the code before I met with another SIGSEGV. Is this going to be a case of changing my code style to one acceptable to PGI?

Hi Emma,

Please clarify if you really meant to use a logical OR “||” instead of a bit-wise OR “|” in your first example? If it’s bitwise then the two statements are not equivalent and your bug is that you’re using a bit-wise operator instead of the intended logical operator.

Note that if you could post a snippet of the code or send an example to trs@pgroup.com and ask customer service to send it to me, it might be helpful in determining the problem

  • Mat

Um, ya. That would be the problem. Thank-you again. Looks like this is going to be an exercise is tracking silly mistakes that haven’t revealed themselves by shear luck.