How functions are compiled? Are function calls expanded inline or are actually CALLED?

Vaibhav_Kaushal · April 6, 2011, 11:03pm

Hi,

As far as I know, the current limit on the size of cuda kernels is 2million PTX instructions which would round off to about 2 MB.

I am trying to create an application which would do a lot of work on the GPU and the kernel would be quite large. I intend to use functions to distribute the work (just like anyone on this planet). I read somewhere that function calls are expanded inline. Is that true? If yes, then will all my device functions be part of a one huge piece of code and result in a massive kernel (or possibly an error!)? Or are the device function calls actually follow the regular stack push-pop on GPU as they do for the CPU?

Thanks

tera · April 6, 2011, 11:39pm

On compute capability 1.x functions are always inlined as no call stack exists. On compute capability 2.x it is by default the decision of the compiler. You can declare a function as noinline to prevent inlining. This might prevent code bloat if a device function is used multiple times in a kernel and somehow the compiler still decides to inline it (e.g. because that would open up additional possibilities for optimization).

njuffa · April 7, 2011, 7:16pm

Just for completeness, there is also forceinline as a counterpart to noinline. Since code size is of concern here, noinline is probably the function attribute of more interest in this case.

Vaibhav_Kaushal · April 8, 2011, 8:00am

@tera; I am with GTX470 and is compute 2.0 capable. relaxes Thanks for the clarification

@njuffa: Thanks for that valueable complement to the previous answer.