I’ve recently revisited some old code that used to compile and run fine with version 1.0 of the toolkit, but crashes nvcc version 1.1. Specifically, I get a segmentation fault from the compiler when I attempt to build.
Running nvcc -ptx will successfully compile a ptx file, and then ptxas will give the segfault when attempting to assemble it, so I suspect the problem is in ptxas. The particular kernel I’m trying to compile is quite large, both in terms of memory usage and number of instructions. Perhaps ptxas is not able to handle it? Of course, I would expect an error message, not a segfault. It shouldn’t be possible to crash the compiler!
Unfortunately I cannot post source code. Has anybody else seen this problem before? I would certainly appreciate a fix!
I am not using any extra compiler flags, so that’s not it. I tried nvcc -cubin, but that segfaulted as well.
It’s not straightforward at all to split this kernel into smaller pieces, so if there’s another mechanism to resolve this, that would be helpful. And it does still work with version 1.0, which makes me wonder if kernel size is really the issue, although I agree it’s possible.
Yeah, unfortunately, ptxas is the tool that figures out how many registers the kernel requires (and generates the cubin), so the -cubin flag won’t help you track this down.
Can you scan through the PTX output and see what the highest register used is? The nvcc compiler generates PTX code which uses static single assigment, so it will look like your PTX code is using hundreds of registers. One of the jobs of ptxas is to figure how to map these PTX registers to real registers, reusing the real registers as much as possible. I’m wondering if your kernel has tripped over some bug that hits when the number of allocated PTX registers is huge.
I’ve also had this problem (code works in 1.0 and segfaults ptxas with 1.1), I submitted a bug report and was told the problem was fixed with the internal version of the toolchain. So theoretically this should be fixed when 1.2 is released.
the beta was expected end of march, so should be soon.
the ___cuda___cuda___cuda___cuda_result I have seen in relation to trig functions. You can find how sinf, etc is implemented in the toolkit in some header file.