As I develop my code base, I’ve tried to maintain high standards for documentation, and documentation on the purpose of each function (as well as its formal arguments) goes in a header file. For CUDA units (
foo.cu), there is an associated CUDA header file (
foo.cuh). But, I’m conflicted as to whether to document the
__global__ functions in those CUDA units in the respective header files, the reason being that, without relocatable device code, I’m not sure I can call the
__global__ kernels from another CUDA unit. Is this correct?
My inclination is to stop listing the
__global__ functions in the header and stick to the C++ accessible functions that will launch them, e.g.
extern launch_foo. Can anyone suggest a standard practice and a reason to adopt it?