Proper way to precompile a kernel


I have a loop, and at each iteration of the loop, I compile 3 kernels, I use them, and I release them.
In order to optimise the code, I tried to compile them from sources outside of the loop, but I have strange problems (the device -which is the same for the 3 kernels- is not valid for the 3rd kernel).
Maybe that’s not the best way to do it.
Can you give me some advice to precompile my programs or kernels?