How to get a function pointer from compiled PTX module and pass to a kernel

In my application I have a kernel that is generated by linking several PTX modules together. One of these is generated by a third-party library and supplied to me as PTX and I can’t change the code generation. Let’s call this module B.

I want to be able to call some function in module B from module A (the main kernel entry point), but the catch is I don’t know what the function name will be until runtime, when I can query the third party library to find the function name, and I don’t want to have to recompile module A.

So I want to do something like this:

— module A

using func_type_t = void(*)();

__global__ void kernel(func_type_t func) {
    // this should call the function from module B

— module B

// how do I get a function pointer for this from the PTX or a CUmodule given the name of the function, "some_func", 
// without modifying this module
.visible .func some_func() {
    // ...

What I want to do is get a function pointer to some_func() and pass that into kernel() in module A at runtime (module B will be linked to module A). I’ve seen examples of getting device function pointers by assigning them to variables then using cudaMemcpyFromSymbol(), but I can’t modify moduleB to create that variable to hold the function pointer.

I assume this is a question about native CUDA kernels? (Means module A is not part of an OptixPipeline.)

In that case I would ask this question on the CUDA Programing and Performance sub-forum.

Searching there for “function pointer” turned up some related threads.
This looks like exactly the same question. Post 17 to the end maybe interesting:

This post explains important restrictions:

Maybe that helps. The above posts are pretty old and I would recommend to read some more.

Thanks Detlef. I swear I posted this on the CUDA forum! No idea how it ended up here!

I had seen that first thread (which is what I was referring to with cudaMemcpyFronSymbol) but not the second, which makes it all sound like a bad idea.

So how exactly does OptiX do what appears to be exactly this with the direct call mechanism? At least from the outside it looks similar: I give OptiX a couple of modules and the names of the functions at runtime and it manages to call one from the other.

I’m currently compiling a small trampoline function in order to resolve the name of my function from module B at runtime, and I suppose I could just generate the PTX for that directly to save one step in the process but it all feels a little hacky.

OptiX is not using the input PTX in a way you might expect.
It’s much more involved than just separate compilation of native CUDA modules.

This paper describes the beginning from 11 years ago: OptiX: A General Purpose Ray Tracing Engine and it’s working quite differently in the meantime, for example citing the OptiX 4.0.0 release notes to give you an idea:
“This version is an important milestone in the evolution of OptiX, featuring a complete re-implementation of many core components, including an all-new LLVM-based compilation pipeline.”