How to reduce compile time for big kernel function?

Hi, I have redesign a model to use cuda, the kernel function is very big, about 10,000 lines.
I have to wait about 2hours to compile it, is there anyway that can reduce the compiling time?

In my current project, my kernel is about 3000 lines of code but it compiles in 15-20 seconds or so.
My raytracing and GI kernels are something like 13000 lines (split across many files) and compile in about a minute. I haven’t even bothered to set up parallel make for it.

BUT I did find and report one compilation bug dealing with multiplication of 64 bit constants. When such a line was used, compilation time shot up to hours!
This is just a simple x=12345ULL*y; The generated code was correct and ran at full speed… it was the COMPILATION that slowed by 3 orders of magnitude.
NVidia fixed it for the 3.0 toolkit nvcc.

You might not be hitting that exact issue, but perhaps there’s some code structure which has similar compile slowdowns. It may be literally one line of code.
Start chopping out functions and lines of code, ignoring functionality, just to see if suddenly compile speed improves.
If so, you have a good compiler bug to report to NVIDIA!

You may also try the 3.0 toolkit beta just for fun.

I am using the 3.0beta version now.

Only these information are complained by nvcc: warning: variable "scvold" is used before its value is set warning: variable "scvold" is used before its value is set

/tmp/tmpxft_000044cc_00000000-7_clm_cuda.cpp3.i(0): Warning: Olimit was exceeded on function process_patch_device; will not perform function-scope optimization.

	To still perform function-scope optimization, use -OPT:Olimit=0 (no limit) or -OPT:Olimit=164494

/tmp/tmpxft_000044cc_00000000-7_clm_cuda.cpp3.i(0): Warning: To override Olimit for all functions in file, use -OPT:Olimit=164494

	(Compiler may run out of memory or run very slowly for large Olimit values)

Is there any limits of code length in kernel function?
If I remove some core functions, the compile speed is fast.