Should I list kernels in CUDA unit header files?

As I develop my code base, I’ve tried to maintain high standards for documentation, and documentation on the purpose of each function (as well as its formal arguments) goes in a header file. For CUDA units (foo.cu), there is an associated CUDA header file (foo.cuh). But, I’m conflicted as to whether to document the __global__ functions in those CUDA units in the respective header files, the reason being that, without relocatable device code, I’m not sure I can call the __global__ kernels from another CUDA unit. Is this correct?

My inclination is to stop listing the __global__ functions in the header and stick to the C++ accessible functions that will launch them, e.g. extern launch_foo. Can anyone suggest a standard practice and a reason to adopt it?

You can call a kernel from a different compilation unit without specifying relocatable device code during the compile/link.

$ cat t2a.cu
#include <cstdio>
__global__ void k();

int main(){

  k<<<1,1>>>();
  cudaDeviceSynchronize();
}
$ cat t2b.cu
#include <cstdio>

__global__ void k(){

  printf("*\n");
}
$ nvcc -o test t2a.cu t2b.cu
$ compute-sanitizer ./test
========= COMPUTE-SANITIZER
*
========= ERROR SUMMARY: 0 errors
$

CUDA 11.4

1 Like

This is important and looks like a solution. If the __global__ functions can be traded between compilation units I’ll make separate headers for them all, and put the usage documentation there.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.