PGI 11 and 12 cannot make use of system libnuma.so

I have been trying to build MVAPICH2 and Open-MPI using PGI versions 11.10, 12.1 and now 12.4. Neither code will compile and I have a simple code reproducer that illustrates why.

#include <stdio>
int
main ()
{
FILE *f = fopen ("conftest.out", "w");
 return ferror (f) || fclose (f) != 0;

  ;
  return 0;
}

That code appears during the configure steps for both packages. It is compiled with this line:

pgcc -o conftest -g -O2 conftest.c -lnuma

and then executed. The little test code will compile without errors or warnings. This is where the problem occurs. Nearly 30-40% of the time the binary is executed, it coredumps. So the configure process of these packages will sometimes advance pretty far, other times it will fail early.

We’re hitting this bug on two clusters now. The system libnuma.so.1 lives in /usr/lib64 but yet the PGI installer doesn’t seem to be aware of it.

Here’s the result of adding the -v flag to the compile line:

%) pgcc -v -o conftest -g -O2 conftest.c -lnuma

/usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/bin/pgc conftest.c -debug -x 120 0x200 -opt 2 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -x 59 4 -tp gh -astype 0 -stdinc /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/include:/usr/local/include:/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include:/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include:/usr/include -def unix -def __unix -def unix -def linux -def __linux -def linux -def __NO_MATH_INLINES -def __x86_64 -def x86_64 -def LONG_MAX=9223372036854775807L -def ‘SIZE_TYPE=unsigned long int’ -def ‘PTRDIFF_TYPE=long int’ -def __THROW= -def extension= -def amd_64__amd64 -def __k8 -def k8 -def SSE -def MMX -def SSE2 -def SSE3 -def SSE4A -def ABM -predicate ‘#machine(x86_64) #lint(off) #system(posix) #cpu(x86_64)’ -cmdline ‘+pgcc conftest.c -v -o conftest -g -O2 -lnuma’ -x 123 0x80000000 -x 123 4 -x 119 0x20 -def __pgnu_vsn=40102 -alwaysinline /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib/libintrinsics.il 4 -x 120 0x200000 -asm /tmp/pgccdLddjdtPB0hu.s
PGC-I-0222-Redundant definition for symbol __THROW (/usr/include/sys/cdefs.h: 63)
PGC-I-0222-Redundant definition for symbol extension (/usr/include/sys/cdefs.h: 287)
PGC/x86-64 Linux 12.4-0: compilation completed with informational messages

/usr/bin/as /tmp/pgccdLddjdtPB0hu.s -o /tmp/pgccBLddrClHgN8D.o

/usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtbegin.o /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib/initmp.o -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib/pgi.ld -L/usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 /tmp/pgccBLddrClHgN8D.o -lnuma -rpath /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib -o conftest /usr/projects/hpcsoft/turing/pgi/12.4/linux86-64/12.4/lib/nonuma.o -lpgmp -lpthread -lnspgc -lpgc -lm -lgcc -lc -lgcc /usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtend.o /usr/lib64/crtn.o
Unlinking /tmp/pgccdLddjdtPB0hu.s
Unlinking /tmp/pgccBLddrClHgN8D.o

We need a work-around for this issue if possible. We do not have this problem with PGI versions 9 and older, nor with any of the recent GCC and Intel compilers.

Thanks!