'cicc' compilation error and debug flag

bchr · January 10, 2013, 5:18pm

Hi,

When compiling my code in release, I stumbled across an obfuscated compilation error:

Stack dump:
0. Running pass ‘NVPTX DAG->DAG Pattern Instruction Selection’ on function ‘@_Z10kernel_bugIfLj4ELj32ELj64EEvP8DataIT_XT0_EXT1_EXT2_EE’
nvcc error : ‘cicc’ died due to signal 11 (Invalid memory reference)
nvcc error : ‘cicc’ core dumped

This was tested on Linux with:

GPU: GeForce GT 650M
Driver Version: 313.09
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221

I could not find any piece of information anywhere on this error (it is mentioned here though: [url]nvcc error - CUDA Programming and Performance - NVIDIA Developer Forums).

What really surprised me is the fact that compiling with the “-G” debug flag “solves” the error. How am I supposed to find the actual source of the problem?

Since this error happened in a large project, I cannot provide an example code yet. I will post one as soon as I can.

njuffa · January 10, 2013, 7:52pm

Signal 11 would indicate a memory access out of bounds, which should not happen and would point to a bug inside the compiler. I would suggest the following:

(1) Please double-check that you are running the compiler from the CUDA 5.0 final release (as opposed to one of the 5.0 release candidates).

(2) Please double-check that you are not using older CUDA header files with the CUDA 5.0. There have been multiple reports of compiler crashes caused by the inadvertent use of the CUDA 5.0 compiler with CUDA 4.2 header files, which were resolved by installing the correct CUDA 5.0 files.

If nothing questionable turns up during these checks, it is reasonable to assume that there is a problem with an out-of-bounds memory access inside the compiler. In that case, please file a bug report through the registered developer website, attaching self-contained repro code. Thank you.

bchr · January 10, 2013, 10:33pm

Thanks for the information!

(1) I installed CUDA with the Arch Linux community package (this one: [url]https://www.archlinux.org/packages/community/x86_64/cuda/[/url]). The current version (displayed by Arch Linux) is 5.0.35-3. I will try to install it some other way, just in case.

(2) This is actually a recent install, I only installed CUDA 5.0 so this should not be an issue.

I will post a self-contained repro code as soon as I have one. Thanks again for the help!

njuffa · January 10, 2013, 11:48pm

I am not familiar with the Arch Linux community package. I would suggest downloading NVIDIA’s installation package for the supported Linux distribution of your choice from this website:

[url]https://developer.nvidia.com/cuda-downloads[/url]

Please report bugs through the registered developer website. You can reach it via the following page, for example:

[url]https://developer.nvidia.com/cuda-toolkit[/url]

If you scroll down a bit you can see where it says:

Members of the CUDA Registered Developer Program can report issues and file bugs
Login or Join Today

“Login” and “Join Today” are clickable links. If you are not a registered developer yet, the sign-up process is straightforward and in general a registration request will be approved with in one business day. Let me know should you encounter an undue delay.

bchr · January 11, 2013, 12:18pm

I registered and I am waiting for the approval. I also managed to create a repro code with some specific compilation flags. I will post a bug report as soon as I have access to the bug report system.

bchr · January 16, 2013, 11:22am

This was indeed a compiler bug. It should be fixed in the next CUDA release. Thanks again for the help!

Lev · January 16, 2013, 11:26am

Thanks to you.

wlangdon · January 17, 2014, 5:24pm

I also hit this in CUDA 5.0
Although adding -G to nvcc command line did indeed work (considerably shortening the cicc phase)
but -G took my kernel’s run time from about 6.8 milliseconds to about 220 milliseconds.
An alternative work round is to remove -arch=sm_20
Bill
ps: in this case, removing -arch=sm_20 made only about a one percent change to the kernel’s run time

wlangdon · January 17, 2015, 3:07pm

It looks like this can still happen in CUDA 6.0
I have reported it. (The bug ID is: 1600042)
Bill

10340505a · March 10, 2023, 5:47am

I’m met with a similar error on CUDA 12, Gentoo Linux,1660Ti mobile; but this time, it’s signal 6, and the only solution is to use -G without -dopt=on, which removes all optimizations, if I understand correctly.
Also, LLVM is complaining about not enough memory, even though the memory isn’t even full. It happens with caffe2’s aten/src/ATen/native/cuda/Sort.cu, from inside pytorch

njuffa · March 10, 2023, 9:39am

The specific error messages produced (use cut & paste to post them here) likely convey important information. Is the complaint about running out of system memory, or disk space in a particular partition perhaps? The latter is a more common scenario in my experience. How much system memory is in this system, and how much is used by the compiler use during the build? Is the build itself parallelized, and could memory usage be reduced by reducing the degree of parallelization?

“memory isn’t even full” doesn’t mean much. For a hypothetical scenario, assume that 768 MB of system memory are still available, but that the compiler now needs to allocate (based on some characteristic of the code it is compiling) a block of 1 GB. This allocation would fail even though the “memory isn’t even full”.

Use of -G turns off all compiler optimizations, as you noted.

10340505a · March 10, 2023, 9:50am

Failed compile log: #$ _NVVM_BRANCH_=nvvm#$ _SPACE_= #$ _CUDART_=cudart#$ _HERE_=/opt/cuda/bin - Pastebin.com
Successful compile log: nvcc warning : '--device-debug (-G)' overrides '--generate-line-info (-lineinfo) - Pastebin.com
It uses ~7GiB of RAM at most, and I have 28GiB available, and my make flags have -j12 in them, if it’s of any importance

njuffa · March 10, 2023, 9:56am

As I said, you may want to try to reduce the level of build parallelism. Maybe start with -j1 to see whether the build succeeds when run in serial fashion.

LLVM ERROR: out of memory
nvcc error   : 'cicc' died due to signal 6 
nvcc error   : 'cicc' core dumped

So it looks like LLVM runs out of memory in some unspecified way, sees no way to continue under these circumstances and then terminates itself abnormally with SIGBART. To the best of my recollection, I have never come across this scenario. Unfortunately, the internet is full of instances of this error message, and I have yet to determine a predominant root cause.

10340505a · March 10, 2023, 9:57am

Ok, will do and report back
Update: The same situation as the previous fail

njuffa · March 10, 2023, 10:04am

Other standard advice when dealing with strange compiler issues is to try the latest available toolchain (CUDA 12.1 at present) to see if things work better with that.

10340505a · March 10, 2023, 10:14am

Got it

10340505a · March 10, 2023, 3:00pm

After updating to latest cuda and cudnn, I get this:

nvcc error   : 'cicc' died due to signal 11 (Invalid memory reference)
nvcc error   : 'cicc' core dumped

Robert_Crovella · March 10, 2023, 3:05pm

I think the usual suggestion at this point would be to create a short, self-contained, complete test case, that reproduces the issue, and file a bug.

10340505a · March 10, 2023, 3:09pm

I don’t really write cuda code tho……
RIP.

10340505a · March 10, 2023, 3:10pm

Or should I use the files at hand?

Topic		Replies	Views
Nvcc error : 'cicc' died with status 0xC0000005 - Only in DEBUG mode CUDA NVCC Compiler	7	2458	April 30, 2024
gcc passing compiler options to nvcc release 8.0, V8.0.26 - cudafe died signal 11 CUDA Programming and Performance	9	1735	September 29, 2016
CUDA compile trouble CUDA Programming and Performance	47	5112	November 8, 2010
first install of cuda CUDA Setup and Installation	6	7632	February 12, 2017
Nvcc 12.3 with gcc 13.2 not working CUDA NVCC Compiler	11	9424	March 12, 2024
How to debug "Invalid memory reference" while generating linker CUDA Programming and Performance	17	5174	August 21, 2014
Ubuntu 20.04, GCC 9.3, Cuda Toolkit 11.3 - not a supported combination? CUDA Programming and Performance	11	8876	November 4, 2021
gcc 4.4 support anytime soon? CUDA Programming and Performance	24	108099	April 9, 2010
nvcc Segfault CUDA Programming and Performance	6	11408	October 14, 2010
Build conflict with opencv on os x? CUDA Programming and Performance	1	7408	February 13, 2009

'cicc' compilation error and debug flag

Related topics