In my experience, compilation time scales with the amount of code that is generated (most likely because of the time the optimizer spends optimizing that code). I had one kernel in the past with a loop unrolled 27 times that called a big inlined device function each time. That one too about ~30 seconds to compile.
I compile really big kernels, and compilation time has been all over the place (including 10 minutes). It really gets out of hand when you set a limit on the register usage and the compiler tries to match it. Try setting --maxrregcount=128 to give the compiler room to breath.
Using the maxrregcount paramenter reduces the compilation time a bit, however not significantly. But as you had similiar experencies, I don’t worry too much about. Then the only thing I can do, is having a coffee while compiling… :)