building samples dies with signal 11 in cudafe - cuda 6.5, driver 343.22.

valdisk · September 27, 2014, 9:08pm

After installing CUDA 6.5, most of the sample code doesn’t build. I tracked it down to cudafe dying a
very quick death:

strace /usr/local/cuda-6.5/bin/cudafe
execve(“/usr/local/cuda-6.5/bin/cudafe”, [“/usr/local/cuda-6.5/bin/cudafe”], [/* 67 vars */]) = 0
brk(0) = 0x1a49000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9bccf6f000
access(“/etc/ld.so.preload”, R_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9bccf6e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9bccf6d000
arch_prctl(ARCH_SET_FS, 0x7f9bccf6e680) = 0
— SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} —
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

This is on a Dell Latitude E6500, Fedora x86_64. Any ideas? Looks like ARCH_SET_FS doesn’t agree with something even though it returns OK.

Robert_Crovella · September 27, 2014, 9:49pm

which fedora version? are the samples installed in /usr/local/cuda/samples? do you run make as root?

valdisk · September 28, 2014, 10:16pm

I’ve already done all that analysis. The problem is that cudafe dies really early on - before it’s even looked at its arguments, or checked if it’s running as root, or tried to open its input. I suspect the root problem here is that something in Fedora Rawhide and/or the kernel has changed, and I’m more than happy to chase the regression to the upstream culprit, but cudafe is a stripped binary with some modifications to the ELF header sufficient to stop gdb from being able to open it, so the usual debugging method of just parking a breakpoint on the arch_prctl() call and then ‘stepi’ until it dies won’t work.

Robert_Crovella · September 28, 2014, 10:34pm

If you use Fedora 20 (a qualified distro for CUDA 6.5) I think things will work just fine for you.

valdisk · September 28, 2014, 11:39pm

Yeah, except the goal here is to track down what caused the regression, because something in Fedora or the kernel changed. Oh well… looks like ‘git bisect’ time again.

valdisk · September 30, 2014, 4:38am

Gaah. Finally found it, after it died running under a Fedora 20 kernel as well.

rpm -V cuda-core-6-5
…5… /usr/local/cuda-6.5/bin/cudafe
…5… /usr/local/cuda-6.5/bin/cudafe++
…5… /usr/local/cuda-6.5/bin/cuobjdump
…5… /usr/local/cuda-6.5/bin/fatbinary
…5… /usr/local/cuda-6.5/bin/ptxas
…5… /usr/local/cuda-6.5/nvvm/lib64/libnvvm.so.2.0.0

Something corrupted the binaries. ‘yum reinstall cuda-core-6-5’ fixed it, and everything seems to be working. Now to figure out what screwed the binaries up…

Topic		Replies	Views
cudafe crashing CUDA Programming and Performance	5	5715	February 5, 2009
CentOS 5.5+CUDA3.2rc: 'cudafe' died due to signal 11 rock solid ICE on boost 1.33.1 posix_ti CUDA Programming and Performance	11	3443	November 12, 2010
SDK 4.0 Windows debug build crash CUDA Programming and Performance	2	8835	December 12, 2011
cudafe.exe crashes when compiling sample SDK program CUDA Programming and Performance	0	1910	July 4, 2011
compile error using CUDA 2.0 'cudafe' died due to signal 11 CUDA Programming and Performance	5	15315	September 11, 2008
gcc passing compiler options to nvcc release 8.0, V8.0.26 - cudafe died signal 11 CUDA Programming and Performance	9	1759	September 29, 2016
Cannot run SDK samples ('kernel execution failed') CUDA Programming and Performance	4	8419	December 20, 2008
GM107 + CUDA 6.0 CUDA Setup and Installation	19	6644	March 13, 2014
CUBLAS 5.5 shutdown segfaults on ubuntu 12.04 LTS GPU-Accelerated Libraries	3	1811	June 14, 2013
'cudafe++' died with status 0xC0000005 (ACCESS_VIOLATION) CUDA Programming and Performance	17	10619	February 27, 2024

building samples dies with signal 11 in cudafe - cuda 6.5, driver 343.22.

Related topics