Device Function Library How to make a lib of device functions

Hi everyone

I m planning to make a library that includes device functions, like atomic lib. I have written my source file and header and compiled as library archive successfully. But when i included this library in another source file and use any device function, i had :

Error: External calls are not supported (found non-inlined call to _Z8Function_name)

at compiling step of source file.

It looks the reason of error is usage of device function that is in another obj file. To handle this error, what should i do ?

there is no CUDA linker, so the answer is you can’t do this. you’ll have to include things as headers.

tmurray thanks for answer. you mean include function bodies in headers too ? But in sm11_atomic_functions there are device function prototypes but not bodies ?

device functions always need to be in header so that they can be inlined and shared across multiple CU files

I wish there were a way to link but so far there appears to be no way to do it.

The workaround that I use, is everything is in a header file. If some functions depend on other functions, I #include the entire file instead of just function prototypes. And I make sure to have #include guards to prevent multiple definitions (less important if we were using only function prototypes).

Then I have a single .cu file that includes whatever subset of the modules I want to build. So effectively, compiling the master .cu file acts as a linker.

The main headache I have with this method is the global namespace gets polluted much faster, because there is no file scope. I have kernels as well as device functions and host functions in these #include files, and if for one kernel, the best block size is, say, 128, and I #define BLOCK_SIZE 128, then this can cause major problems because the scope of such a definition is not just the file it’s defined in. And to make it worse, it depends on the order of includes.

I recommend the C++ idiom “static const int BLOCK_SIZE=128;” instead of “#define BLOCK_SIZE 128” because at least then you can detect if you have multiple conflicting definitions.

I’d like to add, different kernels may reside in different translation units (.cu files) but if two kernels use a common device variable, then there will be two copies of the device variable, and the two kernels will see different copies of the same variable. So while it is possible to compile .cu files separately and link them, I believe doing so can be hazardous. I got bitten by this once and now my policy is to have only a single .cu file.

Using multiple .cu files can be handled and i have used it. Also global function calls from library archives has worked, too.

But the problem is, while using pre-compiled object files that includes device functions (i mean library archive) , nvcc returns me error i told above (Externals calls are not supported)

So as Jamie told, you must append all the code you are using into your project files. This is killing modularity and re-usability. So is there a way to use pre-compiled device function libraries like cuda sm_11 - sm_13 atomic function libraries?

If you want to hack something together it would be possible to create a library with many global functions.

However, until they add support for device function calls into the compiler there is absolutely no way to create a library of device functions without writing the equivalent of a linker yourself…