Force JIT for all kernels of current module

I understand how JIT occurs at run-time when PTX is added to specific architectures at compile time.
I understand how JIT is lazily triggered when kernel launches occur.
I understand how local cache helps JIT to occur only once.

Now, my question : is it possible to force the JIT compilation of all kernels of the current module in order to let the user ackowledge the acceptance JIT-compile time as an explicit action ?

In the CUDA SDK, I can only find API to dynamically load modules from files, but not get a handle to the “current” module (if it means anything) running from a DLL of the current process.

I can find hints to binary parse DLLs, extract the PTX resource and load it as a module, but it is not handy at all.

I was expecting some cuModuleGetCurrentList() that would be called from the DLL embedding the compiled *.cu, to get one module per attached PTX, then some cuModuleLoadAllData()with parameters similar to cuModuleLoadDataEx()

Do I really miss the point, or does it make sense to ask that ?

You can force JIT with a environment variable but I would say in my experience that is usually used as a diagnostic.

Using an environment variable is global to the application. But let’s say that my application depends on two different DLLs M1 and M2, it would be handy to be able to call cuModuleGetCurrentList() and cuModuleLoadDataEx() independently for M1 and/or M2.