I have a deadloop like below, deadloop calling a kernel. I wish this kernel function cached in somewhere so it runs as fast as it can.
while(1) {
kernel<<<>>>()
}
- Will the kernel function be cached in driver for the 1st time it is called?
- Do I need any prerequiste to try to put the kernel function cached in driver? Like add some compiler option. call some functions.
- kernel function cache behaviors is the same between driver API and runtime API?
- Further more, because this kernel function run on cuda core. Did the kernel’s device code copied to GPU from driver each time it called?
- Could the kernel function(not any global data, constant data) cached in device’s constant memory/global memory/L2 cache/L1 cache?