building samples dies with signal 11 in cudafe - cuda 6.5, driver 343.22.

After installing CUDA 6.5, most of the sample code doesn’t build. I tracked it down to cudafe dying a
very quick death:

strace /usr/local/cuda-6.5/bin/cudafe
execve(“/usr/local/cuda-6.5/bin/cudafe”, [“/usr/local/cuda-6.5/bin/cudafe”], [/* 67 vars */]) = 0
brk(0) = 0x1a49000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9bccf6f000
access(“/etc/ld.so.preload”, R_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9bccf6e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9bccf6d000
arch_prctl(ARCH_SET_FS, 0x7f9bccf6e680) = 0
— SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} —
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

This is on a Dell Latitude E6500, Fedora x86_64. Any ideas? Looks like ARCH_SET_FS doesn’t agree with something even though it returns OK.

which fedora version? are the samples installed in /usr/local/cuda/samples? do you run make as root?

I’ve already done all that analysis. The problem is that cudafe dies really early on - before it’s even looked at its arguments, or checked if it’s running as root, or tried to open its input. I suspect the root problem here is that something in Fedora Rawhide and/or the kernel has changed, and I’m more than happy to chase the regression to the upstream culprit, but cudafe is a stripped binary with some modifications to the ELF header sufficient to stop gdb from being able to open it, so the usual debugging method of just parking a breakpoint on the arch_prctl() call and then ‘stepi’ until it dies won’t work.

If you use Fedora 20 (a qualified distro for CUDA 6.5) I think things will work just fine for you.

Yeah, except the goal here is to track down what caused the regression, because something in Fedora or the kernel changed. Oh well… looks like ‘git bisect’ time again.

Gaah. Finally found it, after it died running under a Fedora 20 kernel as well.

rpm -V cuda-core-6-5
…5… /usr/local/cuda-6.5/bin/cudafe
…5… /usr/local/cuda-6.5/bin/cudafe++
…5… /usr/local/cuda-6.5/bin/cuobjdump
…5… /usr/local/cuda-6.5/bin/fatbinary
…5… /usr/local/cuda-6.5/bin/ptxas
…5… /usr/local/cuda-6.5/nvvm/lib64/libnvvm.so.2.0.0

Something corrupted the binaries. ‘yum reinstall cuda-core-6-5’ fixed it, and everything seems to be working. Now to figure out what screwed the binaries up…