Best way to distribute a collection of __device__ functions


this is probably a trivial question and only remotely related to CUDA, but I am wondering about the the best way to distribute a set of device functions and unfortunately not an expert on the subject.
My first thought was to create a static library (resulting in a .h and a .lib/.a file), but that doesn’t seem to be possible because device functions are always inlined.
Therefore I would think that the “correct” way would be to just provide a header file that contains both signatures and implementations of said functions.

  1. Is that correct?
  2. Is it true that there is no way to hide the implementation for inline functions (my logic tells me that there shouldn’t because the compiler needs to know the source code for inlining)?

Thanks in advance and I hope these slightly off-topic questions are tolerable.