I have question about overhead when calling kernel (function run on GPU).
What does overhead contain? does it contain copy input parameter to registers? what else does it ocntain?
and what is startup overhead? what does it contain?
for example in scanLargeArray sample of CUDA SDK 2.1 there is these lines:
// run once to remove startup overhead
prescanArray(d_odata, d_idata, num_elements);
// Run the prescan
prescanArray(d_odata, d_idata, num_elements); .... cutStopTimer(timerGPU); ....
I dont’ understand why doese calling the first kernel call remove the startup overhead in second kernel call?