dynamically load cuda code in my application

What I want to do is to call a cuda function in my existing application, but without recompiling my application.
I want to dynamically load cuda code like this,

typedef void (*funcPointer)();
handle = dlopen ("./cudafunc.so", RTLD_LAZY);
funcPointer fun = dlsym(handle, “hello”);


cudafunc.cu (compile to cudafunc.so)

global void kernelfunc()


extern “C” hello ()

Can I do this?

Yes, there are functions in the driver API (“loadModule()” or similar) that will let you load/run precompiled .cubin files.

I think he meant loading in Host code, not just device code.