when a application runs, the first execution of a kernel will spend a longer time than the second.

when a application runs, the first execution of a kernel will spend a longer time than the second.
I have confused by this appearance. how to avoid it ?

It may be CUDA initialization time. It could also be JIT compile time, which is effectively part of initialization time. It could also be a cache population effect.

what is about the cache population effect, is there some books describe it ? thank you very much, txbob