Signal 11 when compiling for profiling

egodfred · July 30, 2015, 6:36pm

I have run the code before without issues , but when I added additional flags for profiling I get this error

pgfortran -Mcuda -Minfo  -ta=nvidia  -c precision_m.F90
pgfortran -Mcuda -Minfo  -ta=nvidia  -c cpurandom_m.F90
cpp  -DGLOBAL host_gen_m.CUF > host_gen_m1.CUF
pgfortran -Mcuda -Minfo  -ta=nvidia  -c host_gen_m1.CUF
cpp  -DGLOBAL host_subs_m.CUF > host_subs_m1.CUF
pgfortran -Mcuda -Minfo  -ta=nvidia  -c host_subs_m1.CUF
Stack dump:
0.	Running pass 'Simplify Live Out' on function '@host_subs_m_d_local_energy_'
pgnvd-Fatal-/state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/bin/cicc TERMINATED by signal 11
Arguments to /state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/bin/cicc
/state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/bin/cicc -arch compute_20 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 /tmp/pgnvdsKkd2_iVgVe1.i -o /tmp/pgcudaforSJkde3FtmLFR.ptx -nvvmir-library /state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/libdevice/libdevice.compute_20.10.bc
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (host_subs_m1.CUF: 1)
PGF90/x86-64 Linux 14.10-0: compilation aborted
make: *** [host_subs_m1.o] Error 2

The problem seems to be coming from the Local Energy subroutine,
here are some of the necessary modules in case you might want to reproduce the error

https://dl.dropboxusercontent.com/u/59996494/host_subs_m1%20(Fannelia's%20conflicted%20copy%202015-06-22).CUF
https://dl.dropboxusercontent.com/u/59996494/host_gen_m1.CUF
[/code]

egodfred · August 4, 2015, 12:56pm

This seems to be a bug from the compiler which happens after the optimization of the code , because the code compiles correctly at -O0 and -O1.

MatColgrove · August 4, 2015, 8:47pm

Hi egodfred,

I haven’t been able to reproduce your error. Instead get a different error with 14.7 and successful compilation with 14.9.

Which version are you using?

Can you post the source for “cpurandom_m.F90”? I had to comment it out in order to get the source to compile. Also, I’m using a “percision_m.F90” file from one of you’re earlier posts. Please re-post if there have been updates.

Thanks,
Mat

% pgfortran -Mcuda -Minfo=accel -c percision_m.F90 host_gen_m1.CUF host_subs_m1.CUF -V14.7 -acc -O2
percision_m.F90:
host_gen_m1.CUF:
host_subs_m1.CUF:
/tmp/pgcudaforGVLdGvteGa6K.gpu(2001): error: argument of type "const char *" is incompatible with parameter of type "void *"

1 error detected in the compilation of "/tmp/pgnvdLWLdVdqHsRtC.nv0".
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (host_subs_m1.CUF: 1)
PGF90/x86-64 Linux 14.7-0: compilation aborted
%
% pgfortran -Mcuda -Minfo=accel -c percision_m.F90 host_gen_m1.CUF host_subs_m1.CUF -V14.9 -acc -O2
percision_m.F90:
host_gen_m1.CUF:
host_subs_m1.CUF:
%

egodfred · August 14, 2015, 3:21pm

Here is the nvcc version

nvcc: NVIDIA (R) Cuda compiler driver 
Copyright (c) 2005-2014 NVIDIA Corporation 
Built on Thu_Jul_17_21:41:27_CDT_2014 
Cuda compilation tools, release 6.5, V6.5.12

and compiler version

pgf90 14.10-0 64-bit target on x86-64 Linux -tp istanbul 
The Portland Group - PGI Compilers and Tools 
Copyright (c) 2014, NVIDIA CORPORATION.  All rights reserved.

egodfred · August 14, 2015, 3:25pm

All necessary files:
https://dl.dropboxusercontent.com/u/59996494/UNIVAQ.zip

MatColgrove · August 14, 2015, 7:43pm

Thanks egodfred. With the full source, I was able to recreate the error. It does not occur with the 15.x compiler or when the OpenACC flag “-ta=nvidia” is removed. Since you don’t have OpenACC code in this file, I’d recommend removing the “-ta” flag.

Mat

% pgfortran -Mcuda -Minfo -ta=nvidia -DGLOBAL -c host_subs_m.CUF -V14.10
Stack dump:
0.      Running pass 'Simplify Live Out' on function '@host_subs_m_d_local_energy_'
pgnvd-Fatal-/proj/pgi/linux86-64/2014/cuda/6.0/nvvm/bin/cicc TERMINATED by signal 11
Arguments to /proj/pgi/linux86-64/2014/cuda/6.0/nvvm/bin/cicc
/proj/pgi/linux86-64/2014/cuda/6.0/nvvm/bin/cicc -arch compute_20 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 /tmp/pgnvdLfcdVgGNULiI.i -o /tmp/pgcudafor-dcd9-4p3-6U.ptx -nvvmir-library /proj/pgi/linux86-64/2014/cuda/6.0/nvvm/libdevice/libdevice.compute_20.10.bc
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (host_subs_m.CUF: 1)
PGF90/x86-64 Linux 14.10-0: compilation aborted

% pgfortran -Mcuda -Minfo -DGLOBAL -c host_subs_m.CUF -V14.10
% pgfortran -Mcuda -Minfo -ta=nvidia -DGLOBAL -c host_subs_m.CUF -V15.1
%

egodfred · August 24, 2015, 4:36pm

pgfortran -Mcuda -Minfo=accel -c precision_m.F90 cpurandom_m.F90 host_gen_m1.CUF host_subs_m1.CUF main.CUF -acc -O2 gpuqmc
precision_m.F90:
cpurandom_m.F90:
host_gen_m1.CUF:
host_subs_m1.CUF:
Stack dump:
0. Running pass ‘Simplify Live Out’ on function ‘@host_subs_m_d_local_energy_’
pgnvd-Fatal-/state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/bin/cicc TERMINATED by signal 11
Arguments to /state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/bin/cicc
/state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/bin/cicc -arch compute_20 -m64 -ftz=1 -prec_div=1 -prec_sqrt=1 -fmad=1 /tmp/pgnvdX7QftwwRupaZ.i -o /tmp/pgcudaforP6Qf7raYX1W0.ptx -nvvmir-library /state/partition1/pgi14/linux86-64/2014/cuda/6.0/nvvm/libdevice/libdevice.compute_20.10.bc
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (host_subs_m1.CUF: 1)
PGF90/x86-64 Linux 14.10-0: compilation aborted
main.CUF:

MatColgrove · August 24, 2015, 6:18pm

Sorry about that. I only checked the one file and not the whole project. Looks like the problem is actually with the CUDA C code generator. Using LLVM works for me. Can you give this a try?

% pgfortran -Mcuda=llvm -Minfo=accel -c precision_m.F90 cpurandom_m.F90 host_gen_m1.CUF host_subs_m1.CUF main.CUF -acc -O2 gpuqmc -V14.10
precision_m.F90:
cpurandom_m.F90:
host_gen_m1.CUF:
host_subs_m1.CUF:
main.CUF:

Mat

egodfred · August 25, 2015, 4:22pm

It compiles but gives this error at runtime
0: DEV_BIND_TEXTURE: cudaBindTexture failed: 18(invalid texture reference)

MatColgrove · August 25, 2015, 8:22pm

I have not see this before and the only references I can find on the web occur when trying to use textures on older devices where textures we’re supported.

Though, I doubt that’s the problem here. Can you post or send to PGI Customer Service (trs@pgroup.com) your data files so I can try to recreate the error? Hopefully I can then determine the cause.

Thanks,
Mat

egodfred · August 30, 2015, 11:17am

I have sent the files , hope yo hear from you soon.

Device info:

CUDA Driver Version: 6050
NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.29 Thu Jul 31 20:23:19 PDT 2014

Device Number: 0
Device Name: GeForce GTX 670
Device Revision Number: 3.0
Global Memory Size: 4294770688
Number of Multiprocessors: 7
Number of SP Cores: 1344
Number of DP Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 980 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 3004 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 524288 bytes
Max Threads Per SMP: 2048
Async Engines: 1
Unified Addressing: Yes
Initialization time: 540639 microseconds
Current free memory: 4246446080
Upload time (4MB): 1300 microseconds ( 905 ms pinned)
Download time: 2726 microseconds (1053 ms pinned)
Upload bandwidth: 3226 MB/sec (4634 MB/sec pinned)
Download bandwidth: 1538 MB/sec (3983 MB/sec pinned)
PGI Compiler Option: -ta=tesla:cc30

NVCC Version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

MatColgrove · August 31, 2015, 8:00pm

Hi Godfred,

When I compile with “-DTEXTURE”, I get many syntax errors. Do I have the most recent version? With “-DGLOBAL”, I get a runtime error since there’s no “lipot.dat” data file. Do I need this file or am I doing something wrong?

Thanks,
Mat

% pgfortran -Mcuda -Minfo=accel precision_m.F90 cpurandom_m.F90 host_gen_m.CUF host_subs_m.CUF main.CUF -acc -O2 -o gpuqmc -DGLOBAL -V15.7
precision_m.F90:
cpurandom_m.F90:
host_gen_m.CUF:
host_subs_m.CUF:
main.CUF:

% gpuqmc                                                                                            
 Read Mass of Helium M_he (cm) :     4.002603240000000
Read box length dmax (a.u) :     2.000000000000000
Read Minimum distance between Helium atoms and impurity (a.u) :
    0.000000000000000
Read Maximum potential between atoms (a.u) :     1.000000000000000
Read Minimum potential between atoms (a.u) :    -700.0000000000000
Read Minimum distance between Helium atoms (a.u) :     1.000000000000000
Read Mass of atomic impurity :     6.941000000000000
Read Wavefunction for imp-he parameter 1 :     1251.550607449640
Read Wavefunction for imp-he parameter 2 :     3.331331768526400
Read Wavefunction for he-he parameter 1 :     3333.489699684990
Read Wavefunction for he-he parameter 2 :    2.6974148946484230E-018
Read Number of Helium atoms :                         3
Read Number of walkers :         10240
Read Number of micro Updates :             1
Read Number of macro updates (markov chain walks) :         10000
switch integer to decide where to read initial configuration :             1
Read perturbation step :    0.7000000000000000
Number of Blocks :             1
Base Number :             8
Number of msteps :     1.000000000000000
  Reduced mass is :     4627.691078620828
PGFIO-F-217/list-directed read/unit=16/attempt to read past end of file.
File name = lipot.dat    formatted, sequential access   record = 1
In source file host_gen_m.CUF, at line number 314

egodfred · September 8, 2015, 2:34pm

Sorry about that , yes you need the lipot.dat file , to generate the Li-He Potential function.

https://dl.dropboxusercontent.com/u/59996494/lipot.dat

MatColgrove · September 8, 2015, 9:08pm

Thanks egodfreed, though the link appears to be broken or the file doesn’t exist. Can you double check?

Thanks,
Mat

egodfred · September 10, 2015, 1:21pm

Sorry about that, the link should be working now, I also included the file in the codes I recently sent.

MatColgrove · September 10, 2015, 7:39pm

After reviewing egodfred’s code, this appears to be the as a known problem that we had in the late 14.x releases when multiple shared arrays were being used in the same kernel. The error was fixed in the 15.1 release and were working on getting egodfred updated to this release.

Mat

Topic		Replies	Views
First try compile errors Legacy PGI Compilers	15	14361	August 29, 2013
cuModuleLoadData error 209 Legacy PGI Compilers	7	16305	February 10, 2015
MPICH linking failing Legacy PGI Compilers	12	12612	October 25, 2013
Unable to import PGPROF generated profile data Legacy PGI Compilers	13	6201	February 25, 2018
nvvm parse error Legacy PGI Compilers	3	5629	May 21, 2015
Errors Linking with PGI 16.9 Legacy PGI Compilers	6	4891	November 4, 2016
Compiler failed to translate accelerator region Legacy PGI Compilers	1	3788	April 29, 2015
nvcc Segfault CUDA Programming and Performance	6	11415	October 14, 2010
PGF90-W-0155-Compiler failed ... with PGI 12.4 Legacy PGI Compilers	17	11307	August 30, 2012
Runtime problem with PGFORTRAN Linux	40	1161	October 7, 2019

Signal 11 when compiling for profiling

Related topics