I wish there were a way to link but so far there appears to be no way to do it.
The workaround that I use, is everything is in a header file. If some functions depend on other functions, I #include the entire file instead of just function prototypes. And I make sure to have #include guards to prevent multiple definitions (less important if we were using only function prototypes).
Then I have a single .cu file that includes whatever subset of the modules I want to build. So effectively, compiling the master .cu file acts as a linker.
The main headache I have with this method is the global namespace gets polluted much faster, because there is no file scope. I have kernels as well as device functions and host functions in these #include files, and if for one kernel, the best block size is, say, 128, and I #define BLOCK_SIZE 128, then this can cause major problems because the scope of such a definition is not just the file it’s defined in. And to make it worse, it depends on the order of includes.
I recommend the C++ idiom “static const int BLOCK_SIZE=128;” instead of “#define BLOCK_SIZE 128” because at least then you can detect if you have multiple conflicting definitions.
I’d like to add, different kernels may reside in different translation units (.cu files) but if two kernels use a common device variable, then there will be two copies of the device variable, and the two kernels will see different copies of the same variable. So while it is possible to compile .cu files separately and link them, I believe doing so can be hazardous. I got bitten by this once and now my policy is to have only a single .cu file.