Illegal instruction on Intel Xeon

I’m having trouble running executables build with pgi on a Intel Xeon CPU E5-2680 v3 @ 2.50GHz.

I’ve tried building with:

-tp=sandybridge-64
-tp=sandybridge
-tp=nehalem-64,sandybridge-64
-tp=nehalem-64
-tp=core2-64
-tp=x64

But regardless I always end up with:

Program received signal SIGILL, Illegal instruction.
0x000000000059ca3f in __fss_sincos_fma4 ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-55.el7_0.5.x86_64
(gdb) bt
#0  0x000000000059ca3f in __fss_sincos_fma4 ()
#1  0x0000000000458ba0 in ez_lac () at ./f_ezscint.f90:3282
#2  0x00000000004500ed in ez_crot () at ./f_ezscint.f90:1121
#3  0x000000000042a58e in c_ezgfxyfll ()

I’m using

pgfortran 13.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2013, NVIDIA CORPORATION.  All rights reserved.

And my localrc is:

gmp13x64@dena5:EMIS_PREP$ cat /cm/shared/apps/pgi/linux86-64/13.10/bin/localrc 
set LFC=-lgfortran;
set LDSO=/lib64/ld-linux-x86-64.so.2;
set GCCDIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.2;
set GPPDIR= /cm/shared/apps/slurm/14.03.0/include /usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2/x86_64-redhat-linux /usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2/backward /usr/lib/gcc/x86_64-redhat-linux/4.8.2/include /usr/local/include /usr/include;
set GCCINC=/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include;
set G77DIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.2/;
set OEM_INFO=64-bit target on x86-64 Linux $INFOTPVAL;
set LOCALRC=YES;
set THROW=__THROW=;
set EXTENSION=__extension__=;
set LC=$if(-Bstatic,-lgcc -lgcc_eh -lc -lgcc -lgcc_eh -lc, -lgcc -lc -lgcc);
# GLIBC version 2.17
# GCC version 4.8.2
set GCCVERSION=40803;
set LOCALDEFS=__STDC_HOSTED__;
export PGI=$COMPBASE;
# makelocalrc executed by matt Sun Nov 15 16:40:41
set MPIDIR=/cm/shared/apps/pgi/linux86-64/2013/mpi/mpich;

What target should I use?

Thanks

There is something wrong here. Intel CPUs do not implement FMA4 instructions (see https://en.wikipedia.org/wiki/FMA_instruction_set#CPUs_with_FMA4 ), and the compiler should not have emitted FMA4 instructions unless you explicitly requested a -tp option to target an FPU different from the one on the host. More specifically, if you compiled on the Xeon system, and used the default -tp option or one compatible with the Xeon-E, FMA4 instructions should not be present in the compiled code.

pgf90 -V

will tell you what type of CPU you have.

If it is older than a sandybridge, the illegal instruction makes sense.


dave

On the intel node I’m targeting, I get:

gmp13x64@dena5:~$ pgfortran -V

pgfortran 13.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2013, NVIDIA CORPORATION.  All rights reserved.

On the system I often build on (AMD), I get:

gmp13x64@dena:~$ pgfortran -V

pgfortran 13.10-0 64-bit target on x86-64 Linux -tp bulldozer 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2013, NVIDIA CORPORATION.  All rights reserved.

I build with:

pgfortran -tp=sandybridge ....

As far as I understand, it should make a difference whether I’m building on intel or simply targeting intel. I have build some of my code on the intel nodes, but not my whole dependency stack. I did rebuild all my dependencies between each test, but on the AMD nodes with the target for sandybridge.

I can try simply building on Intel and removing my target specifier, see if that makes any difference.

I’ll post my results as soon as I have them. Thanks

Hi Matt,

I’m wondering if you’re statically linking in the AMD libm or linking with ACML?

  • Mat

I don’t believe I’m linking with ACML. I do install ACML, but have never looked into how to use it (can’t remember off hand what it does)

I do statically link in the lib that has the bad instruction, but I build that static library with the target instruction set. i.e. between every test, I went to the lib’s build directory, make clean'ed, ran a find to delete any copy of the lib, and re-built. Then I did the same with my executable.

Now that I have it narrowed down to literally just a cos() statement causing the issue, I’m going to write one file test cases.

Hi Mat,

I found a lib in my build directory that wasn’t being cleared by make clean or my paranoid find . -iname '*.o' -exec rm \{\} \;, this was holding on to a few objects through the builds and causing these issues.

Once I started cleaning that lib too, my illegal instructions went away.


These builds were done by actually building on the node it self without any target specified. I’m now going to try returning to the AMD headnode and cross compiling by specifying the target.

Thanks for all the help!