'cicc' compilation error and debug flag

Hi,

When compiling my code in release, I stumbled across an obfuscated compilation error:

Stack dump:
0. Running pass ‘NVPTX DAG->DAG Pattern Instruction Selection’ on function ‘@_Z10kernel_bugIfLj4ELj32ELj64EEvP8DataIT_XT0_EXT1_EXT2_EE
nvcc error : ‘cicc’ died due to signal 11 (Invalid memory reference)
nvcc error : ‘cicc’ core dumped

This was tested on Linux with:

GPU: GeForce GT 650M
Driver Version: 313.09
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221

I could not find any piece of information anywhere on this error (it is mentioned here though: https://devtalk.nvidia.com/default/topic/518591/nvcc-error).

What really surprised me is the fact that compiling with the “-G” debug flag “solves” the error. How am I supposed to find the actual source of the problem?

Since this error happened in a large project, I cannot provide an example code yet. I will post one as soon as I can.

1 Like

Signal 11 would indicate a memory access out of bounds, which should not happen and would point to a bug inside the compiler. I would suggest the following:

(1) Please double-check that you are running the compiler from the CUDA 5.0 final release (as opposed to one of the 5.0 release candidates).

(2) Please double-check that you are not using older CUDA header files with the CUDA 5.0. There have been multiple reports of compiler crashes caused by the inadvertent use of the CUDA 5.0 compiler with CUDA 4.2 header files, which were resolved by installing the correct CUDA 5.0 files.

If nothing questionable turns up during these checks, it is reasonable to assume that there is a problem with an out-of-bounds memory access inside the compiler. In that case, please file a bug report through the registered developer website, attaching self-contained repro code. Thank you.

Thanks for the information!

(1) I installed CUDA with the Arch Linux community package (this one: https://www.archlinux.org/packages/community/x86_64/cuda/). The current version (displayed by Arch Linux) is 5.0.35-3. I will try to install it some other way, just in case.

(2) This is actually a recent install, I only installed CUDA 5.0 so this should not be an issue.

I will post a self-contained repro code as soon as I have one. Thanks again for the help!

I am not familiar with the Arch Linux community package. I would suggest downloading NVIDIA’s installation package for the supported Linux distribution of your choice from this website:

https://developer.nvidia.com/cuda-downloads

Please report bugs through the registered developer website. You can reach it via the following page, for example:

https://developer.nvidia.com/cuda-toolkit

If you scroll down a bit you can see where it says:

Members of the CUDA Registered Developer Program can report issues and file bugs
Login or Join Today

“Login” and “Join Today” are clickable links. If you are not a registered developer yet, the sign-up process is straightforward and in general a registration request will be approved with in one business day. Let me know should you encounter an undue delay.

I registered and I am waiting for the approval. I also managed to create a repro code with some specific compilation flags. I will post a bug report as soon as I have access to the bug report system.

This was indeed a compiler bug. It should be fixed in the next CUDA release. Thanks again for the help!

Thanks to you.

I also hit this in CUDA 5.0
Although adding -G to nvcc command line did indeed work (considerably shortening the cicc phase)
but -G took my kernel’s run time from about 6.8 milliseconds to about 220 milliseconds.
An alternative work round is to remove -arch=sm_20
Bill
ps: in this case, removing -arch=sm_20 made only about a one percent change to the kernel’s run time

It looks like this can still happen in CUDA 6.0
I have reported it. (The bug ID is: 1600042)
Bill