As I develop my code base, I’ve tried to maintain high standards for documentation, and documentation on the purpose of each function (as well as its formal arguments) goes in a header file. For CUDA units (foo.cu
), there is an associated CUDA header file (foo.cuh
). But, I’m conflicted as to whether to document the __global__
functions in those CUDA units in the respective header files, the reason being that, without relocatable device code, I’m not sure I can call the __global__
kernels from another CUDA unit. Is this correct?
My inclination is to stop listing the __global__
functions in the header and stick to the C++ accessible functions that will launch them, e.g. extern launch_foo
. Can anyone suggest a standard practice and a reason to adopt it?