compilation of mpich 3.0.1 fails with on 32-bit machines

MPICH 3.0.1 fails to compile with pgi 12.10 compilers on 32-bit machines. There is no problem on 64-bit machines.

The building instructions used were:

./configure --prefix=$HOME/mpich-install --with-device=ch3:nemesis --with-pm=hydra --enable-f77 F77=/opt/pgi/linux86/12.10/bin/pgf90 --enable-fc FC=/opt/pgi/linux86/12.10/bin/pgf90 --enable-cc CC=/opt/pgi/linux86/12.10/bin/pgcc --enable-cxx CXX=/opt/pgi/linux86/12.10/bin/pgCC 
make

and it results in the following compilation error:

PGC-F-0000-Internal compiler error. Unable to allocate a register 8 (topology-x86.c: 77)
PGC/x86 Linux 12.10-0: compilation aborted

any ideas of how to fix this? Thanks

Hi p.j.knowles,

I just tried but was unable to recreate the error on Linux using the 32-bit PGI 12.10 compilers. What OS are you using? Are there any additional flags?

mpich-3.0.1/src/pm/hydra/tools/topo/hwloc/hwloc/src% pgcc -c topology-x86.c -I../include -m32 -w
PGC/x86 Linux 12.10-0: compilation completed with warnings
mpich-3.0.1/src/pm/hydra/tools/topo/hwloc/hwloc/src% file topology-x86.o
topology-x86.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

Thanks,
Mat

Mat,

The problem doesn’t occur if I invoke the compiler directly, but via ‘make’ it does. I tried to cut it down a bit, and the following line:

source='topology-x86.c' object='topology-x86.lo' libtool=yes DEPDIR=.deps depmode=pgcc /bin/sh ../../../../../confdb/depcomp /bin/sh ../../../../../libtool --silent --tag=CC   --mode=compile /opt/pgi/linux86/12.10/bin/pgcc -I/home/andy/mpich-3.0.1/src/pm/hydra/tools/topo/hwloc/hwloc/include -DHWLOC_INSIDE_LIBHWLOC -c -o topology-x86.lo topology-x86.c

reproduces the problem.

Andy (Molpro team)

Hi Andy,

Sorry, i should have been more clear. I built MPICH 3.0.1 completely within it’s make harness but did not see the error. I compiled it again outside of the make build and posted this just to show it compiling.

I’ll continue to poke at it to see if I can get it to fail. When I see these types of issues my next step is to see if its a UMR or other memory problem who’s behaviour is non-deterministic. I’ll let you know what I find out.

  • Mat

Mat,

I’ve done a bit more digging and think I’ve found the problem. The libtool script adds ‘-fpic’ flag, so the following:

/opt/pgi/linux86/12.10/bin/pgcc -I/home/andy/mpich-3.0.1/src/pm/hydra/tools/topo/hwloc/hwloc/include -DHWLOC_INSIDE_LIBHWLOC -c topology-x86.c -fpic

fails to compile. Removing ‘-fpic’ allows the file to compile. I’m not sure why you don’t see the same behaviour. This machine is 32-bit hardware, running openSUSE 12.2.

Thanks,

Andy

Thanks Andy, “-fpic” was the missing piece and I’ve now been able to recreate the error.

The problem is with the asm statement at line 69 of “include/private/cpuid.h”. I’m looking into if this asm isn’t valid for use with fPIC or if we’re doing something wrong. In either case, we shouldn’t be issuing an internal compiler error.

I’m not sure why you don’t see the same behaviour.

I’m not sure. Either -fpic wasn’t being added (I didn’t explicitly request shared libraries, but nether did you) or my configuration by-passed this asm statement. I’ll need to see what define flags make used.

  • Mat

Hi Andy,

Here’s the response from the compiler engineer that I ask to look at this:

The extended asm fails because we have one less general purpose register
to work with when we compile with -fpic on 32-bit. Compiling with -fpic
on 32-bit removes %ebx from the pool of general purpose registers
because it uses %ebx for the PIC register. The 64-bit compilers do not
have this issue because the ABI provides a dedicated PIC register. We
will investigate this further to see if we can improve the register
usage of this asm statement.

Also the programmer is using a hard coded %ebx in their asm statement.
All hard coded registers in an asm statement must be included in the asm
statement’s clobber list. Otherwise, the asm statement may clobber a
value in a hard coded register that the compiler is using for something
else. For example, note the addition of “%ebx” in the clobber list below
(the last part of the asm statement):

asm(
“mov %%ebx,%2\n\t”
“cpuid\n\t”
“xchg %2,%%ebx\n\t”
“movl %k2,%1\n\t”
: “+a” (*eax), “=m” (*ebx), “=&r”(sav_ebx),
“+c” (*ecx), “=&d” (*edx) ::
“%ebx” );


As a work around, the programmer can either compile without -fpic on 32-bit, or possibly use fewer registers in their asm statement.

I add a problem report (TPR#19078) to see if improvements can be done or better error detection can be give. In any case, the compiler shouldn’t give an ICE.

Thanks,
Mat

Mat,

Thanks for the info - I’ve only just noticed your response since I didn’t realize the thread had gone onto a second page.

I implemented a workaround which forces the mpich configure to fail when testing for -fpic compiler flag:

for i in `find . -name configure`; do sed -i -e 's/-fpic/-fpic-fail/' ${i}; done

The good news is that the newly released pgcc 13.1 does not have such a problem compiling mpich, so it appears this problem has been fixed.

Thanks,

Andy

The good news is that the newly released pgcc 13.1 does not have such a problem compiling mpich, so it appears this problem has been fixed.

I’m a bit surprised by that since I don’t see any updates to TPR#19078 and the report came too late to be included in 13.1. We may have found the issue internally or just got lucky.

  • Mat

The good news is that the newly released pgcc 13.1 does not have such a problem compiling mpich, so it appears this problem has been fixed.

I still see this issue using pgcc 13.6 on 32-bit systems. I can replicate it when compiling hwloc standalone or with mpich.