Hello, I have a large kernel that repeatedly calls another small kernel. I would like to tweak the small kernel with each run but do not want to have to recompile everything each time. I would like to recompile(JIT) just a small piece of code. Is it possible to do just-in-time compiling on just a small piece of code without recompiling all the device code?
I thought i could maybe do this by loading several kernels into one module and then just replace the one kernel but this did not work. When I tried loading multiple kernels into one Module with cuModuleLoadDataEx(…) it would just overwrite anything that was there.
10,000 foot view
LOOP
…–CPU: Compile PTX for Kernal 2
…LOOP
…Kernel 1: 200 lines of non-changing code –
…LOOP
…Kernel 2: 30 lines (JIT code is here)
…END LOOP
…Kernel: 200 lines of non-changing code –
…END LOOP
END LOOP
Ideas that have not worked:
-It looks like it can be done exiting in the middle of the kernel then calling the JIT kernel and then running the last part (over and over). But this would be slow because i would need to do the work in several 100 kernel calls instead of just one.
-I could just skip JIT, For the 8 lines of code I could just use an 800 way switch but the ptx code has to do many jumps and this would also be slow.
Thank you for any ideas.