is device code(cubin file)cached somewhere?

I have a deadloop like below, deadloop calling a kernel. I wish this kernel function cached in somewhere so it runs as fast as it can.

while(1) {
  1. Will the kernel function be cached in driver for the 1st time it is called?
  2. Do I need any prerequiste to try to put the kernel function cached in driver? Like add some compiler option. call some functions.
  3. kernel function cache behaviors is the same between driver API and runtime API?
  4. Further more, because this kernel function run on cuda core. Did the kernel’s device code copied to GPU from driver each time it called?
  5. Could the kernel function(not any global data, constant data) cached in device’s constant memory/global memory/L2 cache/L1 cache?

The device code is not copied to the GPU each time it is called. It is copied once, and then re-used from GPU memory.