Can nvcc be faster?

I am converting some c code into cuda code and when the source file is very large with many calling depth, nvcc will freeze at the last phase for a couple of minutes before completion.

Is there a switch or configuration or certain coding style which can improve this?


It is the host C (VC++ for windows) compiler that compiles the C code. NVCC only compiles GPU code (i.e. device, global functions).

So, Does your source code contain too much of GPU elements in it – compared to CPU code & data?

nvcc inlines every function call, so if you have a large calling depth it has to insert code from a large number of function calls. This makes for an extremely long string of code to run through the optimizer and write out to the cubin.

There really isn’t any way around this. You could try the noinline compiler hint (see the CUDA programming guide 2.0 beta), but it is just a hint.

Thanks for the responses.

I found the compilation is really fast in device emulation mode. So I will stick with emulation in developing process and only compile for device in final test.