Separate Kernel Compilation?

Anyone know if I can compile a kernel separately into object files, and do the linking at a later stage with the calling program which uses CUDA Runtime API?

You can compile cuda code into separate object files. You can’t use the cuda runtime API to directly launch kernels not in the same file scope. You can write a small C or C++ wrapper function in the same file as the kernel code and then call that wrapper function from other code. This is how runtime API libraries like cublas and cufft are implemented.

Tip: do not forget to cudaMemcpyToSymbol and cudaBindTexture in each of the wrapper functions, debugging such errors is interesting in some cases…

Thanks for all your replies. I was experimenting to load with shared library .so. It seems to work properly. I compile the kernel into .so files instead of .o files. Once I load these libraries explicitly, the cudaLaunch(…) is able to find the kernels. Looked good.