speed nvcc compiler

The question of how fast the CUDA compiler is has come up several times.
It appears, with small kernels, (see
http://www.cs.ucl.ac.uk/staff/W.Langdon/cuda5/nvcc_timing.gif )
the compiler gets faster when it is asked to compile more kernels.
With a fairly broad peak at about 500 lines per second for above 10000
lines of code but falling away after 20000. With more than 260,000 ptxas
failed but I suspect this is related to running out of memory
(nvcc 5.0, Linux 4GB dual 2.66GHz core).
I am sure details will vary. This CUDA code was created by concatinating many
times the same 87 line kernel. But it does suggest if you want to optimise your code
it makes sense to compile multiple versions of it together rather than separately.

I suspect that compilation times are highly dependent on the optimization phases triggered by particular pieces of code. The time complexity of various compiler phases appears to be super-linear with respect to lines of code. This is not surprising as many of the underlying problems are in NP and only the use of heuristics leads to manageable compilation times in the first place.

Lengthy compilation times (> 10 minutes) and massive memory use by the compiler often go hand-in-hand. This usually happens with voluminous source code, and is a good indication that a particular compiler phase needs optimization work. I would encourage CUDA programmers to file bugs for such occurrences with real-life code bases.

A few months back I cooked up some code that took 25 minutes to compile on a 3.4 GHz machine, while the compiler chewed through 3 GB of memory. The resulting machine code worked correctly, but the lengthy compile time really threw a wrench into my engineering process. I filed a bug, and after the issue was fixed, the same code compiled in 16 seconds.