I understand how JIT occurs at run-time when PTX is added to specific architectures at compile time.
I understand how JIT is lazily triggered when kernel launches occur.
I understand how local cache helps JIT to occur only once.
Now, my question : is it possible to force the JIT compilation of all kernels of the current module in order to let the user ackowledge the acceptance JIT-compile time as an explicit action ?
In the CUDA SDK, I can only find API to dynamically load modules from files, but not get a handle to the “current” module (if it means anything) running from a DLL of the current process.
I can find hints to binary parse DLLs, extract the PTX resource and load it as a module, but it is not handy at all.
I was expecting some cuModuleGetCurrentList()
that would be called from the DLL embedding the compiled *.cu, to get one module per attached PTX, then some cuModuleLoadAllData()
with parameters similar to cuModuleLoadDataEx()
Do I really miss the point, or does it make sense to ask that ?