pgfortran segfault

Main Computer Info
Operating System: Linux Mint 13
GPUs: GeForce 210
GeForce GTX Titan
(Driver version 340.29)

pgfortran version:

pgfortran --version

pgfortran 14.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2014, NVIDIA CORPORATION.  All rights reserved.

Compilation Flags:

-O3 -g -acc -Mcuda -ta=nvidia -Minfo=accel -mcmodel=medium

CUDA Version: 6.0 or 6.5

Secondary Computer Info
Operating System: Ubuntu 12.04.4 LTS
GPUs: GeForce GTX 770
( Driver Version: 331.20)
pgfortran version:

$ pgfortran --version

pgfortran 14.3-0 64-bit target on x86-64 Linux -tp haswell 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2014, NVIDIA CORPORATION.  All rights reserved.

Compilation Flags:

-O3 -g -acc -Mcuda -ta=nvidia -Minfo=accel -mcmodel=medium

CUDA Version: 5.5

Problem Description
I’ve been porting a code which uses OpenMP to OpenACC on a separate computer. The code uses Blas and FFTW. I have successfully ported everything to OpenACC except for the FFTWs. After testing and compiling cuFFT on my main computer I am ready to finish porting the program. The problem is after copying the program from the secondary computer and placing it on the main computer and adjusting the path to the libraries in the Makefile. The program segfaults immediately and I have yet to modified the FFTW to cuFFT in the program.

I receive the following information from gdb:

(gdb) run <input_experiment 
Starting program: /usr/local/home/krygier/OpenACC_avila/avila/bin/gatech_fannulus <input_experiment
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaeac6e23 in _mp_ncpus2 () from /usr/local/home/krygier/pgi/linux86-64/14.10/libso/libpgmp.so
(gdb) bt
#0  0x00002aaaaeac6e23 in _mp_ncpus2 () from /usr/local/home/krygier/pgi/linux86-64/14.10/libso/libpgmp.so
#1  0x00002aaaaf894283 in omp_get_num_threads () from /usr/local/home/krygier/pgi/linux86-64/14.10/libso/libpgc.so
#2  0x00002aaaab0734d9 in ?? () from /home/krygier/openblas/xianyi-OpenBLAS-9c51cdf/libopenblas.so.0
#3  0x00002aaaab073a57 in ?? () from /home/krygier/openblas/xianyi-OpenBLAS-9c51cdf/libopenblas.so.0
#4  0x00002aaaaad586da in ?? () from /home/krygier/openblas/xianyi-OpenBLAS-9c51cdf/libopenblas.so.0
#5  0x00002aaaaad5873a in ?? () from /home/krygier/openblas/xianyi-OpenBLAS-9c51cdf/libopenblas.so.0
#6  0x00002aaaaaaba306 in ?? () from /lib64/ld-linux-x86-64.so.2
#7  0x00002aaaaaaba3df in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x00002aaaaaaac6ea in ?? () from /lib64/ld-linux-x86-64.so.2
#9  0x0000000000000001 in ?? ()
#10 0x00007fffffffe16a in ?? ()
#11 0x0000000000000000 in ?? ()
(gdb)

The program compiles completely, but when I run the program I receive the SegFault. I do not believe it is the stack limit because it is set to unlimited. Any ideas on how to correct this issue? I’ve spent the last week trying to fix it without success. If you need any more information I’ll be glad to help. But I would like to mention I do not have root permissions.

Sincerely,
Krygier

Hi Krygier,

To clarify, you had your program running correctly on a separate test system (the ubuntu 12.04 system?) with the exception that you used FFTW. You then build and tested cuFFT on the main system (running Mint) but this is extraneous since you don’t use it in your program. The real problem is that you copied your test program from the Ubuntu system to the Mint system and it seg faults.

My best as to what’s going is that you’ve built with the PGI 14.3 compilers on the secondary system but are trying to use the runtime libraries from 14.10. What I’d like you to try is either rebuilding on the Mint system using PGI 14.10, or copy the 14.3 runtime libraries (either from the libso or REDIST directory) to the Mint system and set LD_LIBRARY_PATH to this location. The caveat to this theory is that there hasn’t been any changes to our OpenMP runtime between 14.3 and 14.10 so I have no idea why there would be an incompatibility.

Some other things to try are to comment out the call to “omp_get_num_threads”. This should be ok since you’re no longer using OpenMP. Also, I’d simplify your flag set down to just “-acc -ta=tesla”, assuming you don’t need “-mcmodel=medium” or “-Mcuda” for portability.

Other that that, I’m not sure.

  • Mat

To clarify, you had your program running correctly on a separate test system (the ubuntu 12.04 system?) with the exception that you used FFTW. You then build and tested cuFFT on the main system (running Mint) but this is extraneous since you don’t use it in your program.

Correct.

The real problem is that you copied your test program from the Ubuntu system to the Mint system and it seg faults.

I copied the source codes from the Ubuntu system to the Mint system. I correct the paths in the Makefile on the Mint system. I then recompiled the entire code on the Mint system without any compilation errors. But when I run the compiled program I receive a Seg Fault.

Hi Krygier,

Can you send a reproducing example (or the whole code) to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me?

It’s possible that the issue occurs when going from 14.3 to 14.10. Could be the compiler, the runtime library, or even the CUDA version. Though, the problem could also be something specific to the Mint system. First step would be to see if I can reproduce the issue here.

Thanks,
Mat

Hi Mat,

I’ve emailed PGI Customer Service my source code and asked for them to forward it to you. If you have any further questions regarding libraries or compiling the program using the Makefile I’ll be glad to help.

Let me know if your able to reproduce the issue.

Sincerely,
Krygier

Hi Krygier,

We haven’t seen this come in yet. Can you double check that it was sent successfully? If so, it’s possible that it got caught in a spam filter someplace.

  • Mat

Dear Mat,

Yep, it appears it was unsuccessful. I received an email stating:

The following message to <> trs@pgroup.com> > was undeliverable.
The reason for the problem:
5.1.0 - Unknown address error 552-‘size limit exceeded’

The email contains the source code in a compressed tar format. The size of which is 23 MB.

I chopped away unnecessary files in the compressed tar format. Now the size is 12MB. I’ve just finished emailing PGI Customer Service with this tar version. I believe it was successful this time. I haven’t received any failed delivery messages.

I look forward to hearing from you soon.

Sincerely,
Krygier

FYI,

After a few emails, we determined that the issue was not with the code nor the PGI runtime libraries, rather in that OpenBLAS was built slightly different between the two machines. The first used the configuration with “NO_WARMUP” set to 1, while the second had this commented out. When uncommented, the library built and ran correctly.

  • Mat