Working with Multiple Files in CUDA Separating Device Functions in different files

I think similar questions were posted previously.

I am working on a project that involves multiple files, with each file containing a separate device function (not the kernel). Ideally, I would want to create object files of the individual .cu files, and then create a shared object out of all these individual object files. This is to ensure that the compiler does not get “overwhelmed” due to the large code (in case I include all the code together).
From the documentation, it appears that there is no “proper” linker, atleast for the Runtime API. Is there any other way to do it?

Can I achieve such a thing using the Driver API i.e. consider each device function as a module and load it dynamically?
Does anyone know if there is a plan to introduce a good linker with the future tool-kits??

Many thanks for the help!!!

Indeed CUDA has no linker on the device side.

As far as I know the only way to achieve this, at least pre-4.0, is to include all files into one compilation unit. I don’t know if 4.0’s “one context for all” approach has changed this and some pointer tricks could replace a linker there, but I’m not aware of any.