High compilation time


I was just wondering about my compilation time, which increased a lot with the complexity of my program.

There is one kernel which needs a lot of time to compile, its resources are:
Used 39 registers, 5768+5756 bytes smem, 128 bytes cmem[1]

This kernel has several inlined functions. And obviously, so far, it needs a lot of shared memory.

Do anyone of you have similiar experience according to the compilation time?

P.D. I use Netbeans as IDE, which I think shouldn’t affect the compilation time. I use CUDA 2.0, NVCC version V0.2.1221

In my experience, compilation time scales with the amount of code that is generated (most likely because of the time the optimizer spends optimizing that code). I had one kernel in the past with a loop unrolled 27 times that called a big inlined device function each time. That one too about ~30 seconds to compile.

The compilation of my program takes easily about 10min. But just in device release mode, in emulation mode it is not even 20 seconds.

I compile really big kernels, and compilation time has been all over the place (including 10 minutes). It really gets out of hand when you set a limit on the register usage and the compiler tries to match it. Try setting --maxrregcount=128 to give the compiler room to breath.

Using the maxrregcount paramenter reduces the compilation time a bit, however not significantly. But as you had similiar experencies, I don’t worry too much about. Then the only thing I can do, is having a coffee while compiling… :)