Long Compile Time after changing some float's to double's

pojken · June 11, 2011, 10:52pm

Hi all,

I changed some device code to use double precision arrays instead of single precision arrays, and this has resulted in compile times of about 10 min for the device code, where it used to take about 1 min. Reverting the code to single precision restored the 1 min compile times. Has anyone else experienced a similar issue? This is using the CUDA 4.0 Toolkit, a Fermi board, and Windows XP 64 bit. Were you able to change some parameter or compile option to make the compilation process go faster, or understand what is the problem?

Thanks
P

tera · June 11, 2011, 11:41pm

Are you using functions like sin(), exp() etc. a lot? As these get inlined and the double precision functions usually are longer to achieve the higher precision, the compiler has to work harder with double precision.

pojken · June 12, 2011, 2:43am

Yes I do have a few sin(), exp() etc., maybe that is the issue. Thanks for your reply!

sidxavier · February 22, 2012, 11:42am

Hi,

My kernel also takes about 10+ minutes to compile.

My kernel uses double precision arithmetic like __dmul_ru(–), __ddiv_ru(–), nanf(). Can this be causing such high compilation times?

Also at the end of compilation it gives the error – "Entry function ‘FUNC_NAME’ uses too much local data (0x7490 bytes, 0x4000 max)

Can this be a cause for heavy compilation time?

Thanks

Sid.

tera · February 22, 2012, 2:48pm

Which compute capability is your device? This sounds more like a problem with excessive inlining (which also the thread starter might already have had). The functions you name that more or less directly map to machine instructions shouldn’t stress the compiler too hard.

If compiling for compute capability 2.x, try to declare a few strategic device functions as [font=“Courier New”]noinline[/font].

sidxavier · February 22, 2012, 3:37pm

I think I partially figured out the reason for slow compilation.

My compile device was set to 1.3 - which has a limit of 4Kb on the local mem. I changed it to sm_20 which has a local mem limit of 521kb. This completely removes the delay. I guess when the compiler sees overshoot in local mem space it tries harder to fit everything in thus taking more time?

Anyway its better now.

Thanks

njuffa · February 22, 2012, 5:09pm

If you are using CUDA 4.1, differences in build time may also be a function of two different frontends being used. For sm_1x, the Open64 frontend is used, while for sm_20 and higher the NVVM frontend is used. I am taking the fact that you are hitting the local memory limit as an indication that this is fairly hefty code (as tera explained, aggressive inlining can contribute significantly to code size). The long build times are likely simply a function of the amount of code and data that must be manipulated.

If you are seeing build times exceeding ten minutes per file on a reasonably fast modern system with CUDA 4.1, I would recommend filing a bug so the compiler team can investigate. Please attach a self-contained repro case that demonstrates the lengthy build time.

Topic		Replies	Views
Do function pointers increase compilation time? 30 minutes to compile CUDA Programming and Performance	7	4771	October 23, 2011
High compilation time CUDA Programming and Performance	4	1538	September 26, 2008
very slow compile CUDA Programming and Performance	7	2241	February 8, 2012
Long compilation time with CUDA 5.0 CUDA Setup and Installation	4	2367	October 16, 2013
calling a __device__ functions inside kernels CUDA Programming and Performance	4	20588	August 16, 2013
Slow Compilation with multiple calls of same function CUDA Programming and Performance	1	765	September 30, 2011
Slow compile and cudaMalloc CUDA Programming and Performance	8	3697	February 2, 2011
compiling costs too much time CUDA Programming and Performance	3	3932	November 26, 2009
How to reduce compile time for big kernel function? CUDA Programming and Performance	3	5434	November 23, 2009
Performance discrepancy due to compiler settings compiling program with sm_10 vs. sm_13 CUDA Programming and Performance	4	784	August 12, 2011

Long Compile Time after changing some float's to double's

Related topics