kernel size and caching


I have two questions, which I could not solve with the help of the documentation.

First: How largely may a kernel be? Which bounds exist?

Second: kernels become cached. So you can warm up your kernel by starting twice. How many kernels will be chached? Is there a “first in-last out” - strategy?

Thanks for your help

The limit on kernel size is 2MB of native instructions. In practice this is not much of a limitation. We’re not aware of anybody who has hit this yet.

Kernels aren’t really cached, the instructions are stored in video memory. The reason we “warm up” the kernels sometimes is that the CUDA runtime performs some initialization the first time you execute a kernel, which can affect timing.

I see, that’s why we have preallocateArray() in, huh?

but how do you prove it useful? the inner mechanisms seems transparent to me.

Dear Simon,

thanks a lot for your answers. It is very helpful for me.