I’m doing investigation on improving our GPU computing prototyping framework by adding CUDA and OpenCL support to it (currently it only does GLSL)
A nice feature of GLSL shaders, and OpenCL actually, is that it can read in the source of the .glsl and .cl files directly and compile it during execution time. Makes it very easy to make a change, restart app, make a change, try again, etc, without having to recompile the whole thing every time. The problem is CUDA kernels.
In order to get anything resembling JIT run-time kernel loads, I need to use the CUDA driver API. Not really a problem though since it maps decently to OpenCL host code. But as far as I am aware, it is not possible to JIT a .cu file. The SDK JIT sample only shows it for .ptx files. Am I missing anything?
The other alternative would have to be to during runtime, have the framework call nvcc, compile it to a temporary directory then load the generated ptx or cubin.
Related, how effective is the JIT at optimizing CUDA kernels? Does it take shortcuts due to real-time demands or is it equivalent to nvcc? Since drivers get released more often than CUDA Toolkits, could it even be better at optimizing? In short, is it prefered to load a cubin or a JIT ptx when there is an option?