As I develop my code base, I’ve tried to maintain high standards for documentation, and documentation on the purpose of each function (as well as its formal arguments) goes in a header file. For CUDA units (foo.cu), there is an associated CUDA header file (foo.cuh). But, I’m conflicted as to whether to document the __global__ functions in those CUDA units in the respective header files, the reason being that, without relocatable device code, I’m not sure I can call the __global__ kernels from another CUDA unit. Is this correct?
My inclination is to stop listing the __global__ functions in the header and stick to the C++ accessible functions that will launch them, e.g. extern launch_foo. Can anyone suggest a standard practice and a reason to adopt it?