How does Cuda handle static linking?

I’ve been trying to get static linking of Cuda binaries working for a while now with little success. Every time I seem to make some progress I am almost immediately blocked by another error.

My situation is I compile a framework static library, compiled using CMake with CUDA_SEPARABLE_COMPILATION enabled. When I dump the symbols for the static library I get the following:

STT_FUNC         STB_GLOBAL STV_DEFAULT    _Z12get_index_2dRK5uint2
STT_FUNC         STB_GLOBAL STV_DEFAULT    _Z6log2_ny
STT_FUNC         STB_GLOBAL STV_DEFAULT    _Z11align_pixelj5char4
STT_FUNC         STB_GLOBAL STV_DEFAULT    _Z5clz64y
STT_FUNC         STB_GLOBAL STV_DEFAULT    _Z5clz32j

This is correct as far as I can tell and are the functions I want to expose to libraries that consume the framework.

I also have a module static library that takes the device functions from the framework static library and uses them in kernels that it provides. When I dump the symbols for that library I get the following:

STT_CUDA_OBJECT  STB_LOCAL  STO_GLOBAL     $str
STT_CUDA_OBJECT  STB_LOCAL  STO_?          _Z5gammaPKi5uint25char411GammaConfig13GammaGenericsPi.const_opt.0.8
STT_CUDA_OBJECT  STB_LOCAL  STO_?          _param
STT_FUNC         STB_GLOBAL STO_ENTRY      _Z5gammaPKi5uint25char411GammaConfig13GammaGenericsPi
STT_FUNC         STB_GLOBAL STV_DEFAULT  U _Z12get_index_2dRK5uint2
STT_FUNC         STB_GLOBAL STV_DEFAULT  U _Z11align_pixelj5char4
STT_FUNC         STB_GLOBAL STV_DEFAULT  U vprintf
STT_FUNC         STB_GLOBAL STV_DEFAULT  U _Z6log2_ny

Now I assume the U net to several of the functions means undefined. But I am not sure why they would be undefined when the library it linked to in to build it in the first place contains the definition for those functions. On top of that, this builds just fine.

Finally I have the executable that links to both the module library and the framework library. This is where compilation fails, complaining that there aren’t any definitions for the functions marked with a U in the module library. But it’s also linking to the framework library and that does have the definitions, so why not just use those?

Long story short, how does Cuda handle statically linking libraries in a chain like this so I can figure out how the order of operations I need to perform to get successful compilation?

CUDA can do static linking, of course. I won’t be able to help with CMake questions, which this is. CMake is not a NVIDIA product.

if you want to pose a question that uses the CUDA toolchain directly, you may get better help.

That’s just a suggestion, the way things look from my perspective. Do as you wish, of course.