How to split CUDA code

Hello! I’ve tried to organize my CUDA code splitting it in multiple .cu and .cuh files. Basically I have a group of device functions that I want to reuse them by calling in other device functions or global functions in separate .cu files; so I’ve decided to put these functions in a separate file. The problems that occurs are the following:

  1. either I’m getting “error LNK2005” because the functions had been already defined (are used in two different .cu files)

  2. or when I’m trying to use a header .cuh, in order to solve the first issue, I’m getting the “error : external calls not supported (found non-inlined call…”. Something bizarre happens here because the device functions are inlined by default.

The second approach seems to be the right one but how should I fix this problem?
Thanks in advance!

NVIDIA only gives a compiler, not a linker. You cannot link between various compilation units I’m afraid.

So, your first approach results in gcc/vc/your host C++ compiler seeing the same symbol in many places and the second approach results in nvcc not being able to get to the implementation of the declaration given in the header (since it has no linker). Inlining doesn’t work through files, so you can’t expect a function defined in a .cu to be inlined if someone calls it from a different .cu (even if that someone includes a .cuh with the declaration).

At least that’s what I think happens. You could try putting the definition in the header maybe?

I’m not sure if that’d work, I’d need to be more sober :)

Finally I’ve found the solution. I’ve put the all the definitions in the header as you suggested and I’ve also inserted the forceinline function qualifier in order to get rid of the “error LNK2005”. The definition looks something like this:

// MyHeader.cuh

     __device__ __forceinline__ void f()