segfault in my Linux app with PGI 18.4 and -Minstrument

I’m getting a segfault immediately on startup of our application when building with PGI compilers.

The only difference from a build that works is adding ‘-Minstrument’ to FFLAGS.

I have set FFLAGS as follows.

export FFLAGS=“-acc -ta=tesla -Minfo=accel -Minstrument”

I have pasted GDB output here:

(gdb) run
Starting program: /home/bucknerj/src/charmm_builds/projects/jac/charmm/bin/charmm
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library “/lib64/libthread_db.so.1”.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff05b19c3 in ___instentavx ()
from /home/apps/pgi/2018/linux86-64/18.4/lib/libpgnod_prof_inst.so
Missing separate debuginfos, use: debuginfo-install fftw-libs-double-3.3.3-8.el7.x86_64 fftw-libs-single-3.3.3-8.el7.x86_64 libatomic-4.8.5-28.el7_5.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libibverbs-15-7.el7_5.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-15-7.el7_5.x86_64 libstdc+±4.8.5-28.el7_5.1.x86_64 numactl-libs-2.0.9-7.el7.x86_64

I have pasted some additional information about the machine and GPU below.

pgcpuid
vendor id : GenuineIntel
model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
cpu family : 6
model : 45
name : SandyBridge-E 32nm
stepping : 7
processors : 12
threads : 2
clflush size : 8
L2 cache size : 256KB
L3 cache size : 15360KB
flags : acpi aes apic avx cflush cmov cplds cx8 cx16 de dtes ferr
flags : fpu fxsr ht lm mca mce mmx monitor msr mtrr nx osxsave pae
flags : pat pdcm pge popcnt pse pseg36 selfsnoop speedstep sep sse
flags : sse2 sse3 ssse3 sse4.1 sse4.2 syscall tm tm2 tsc vme xsave
flags : xtpr
type : -tp sandybridge



$ pgaccelinfo

CUDA Driver Version: 9020
NVRM version: NVIDIA UNIX x86_64 Kernel Module 396.44 Wed Jul 11 16:51:49 PDT 2018

Device Number: 0
Device Name: GeForce GTX 780 Ti
Device Revision Number: 3.5
Global Memory Size: 3168468992
Number of Multiprocessors: 15
Number of SP Cores: 2880
Number of DP Cores: 960
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1084 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 3500 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 1572864 bytes
Max Threads Per SMP: 2048
Async Engines: 1
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: No
PGI Compiler Option: -ta=tesla:cc35

Replacing -Minstrument with -Mframe fixes the segfault. So it must be one of the other components of -Minstrument.

Hi bucknerj,

We have not seen this error before so it’s unclear why it’s occurring. I’ve also tried adding -Minstrument to a few of my applications but am unable to recreate the error.

-Minstrument adds call backs for profilers when entering and exiting CPU functions. Are you needing this support for your profiler?

Note that while -Mframe is implied when using -Minstrument since the profile call backs need the frame pointer, it’s probably not the cause of the error.

-Mat

Hi bucknerj,

We have not seen this error before so it’s unclear why it’s occurring. I’ve also tried adding -Minstrument to a few of my applications but am unable to recreate the error.

-Minstrument adds call backs for profilers when entering and exiting CPU functions. Are you needing this support for your profiler?

Note that while -Mframe is implied when using -Minstrument since the profile call backs need the frame pointer, it’s probably not the cause of the error.

-Mat

-Mframe actually works for us; no segfault

Our profiler needed frame pointers for backtraces on pre-haswell architecture. -Mframe does provide what was needed.

So we don’t necessarily need -Minstrument

Thanks for checking out our problem.

The profiler we are using is actually
Nvidia Nsight Systems