why nvcc does not accept kernels in .cpp files?


I would like to ask what is the reason that nvcc does not accept CUDA kernels and CUDA kernels calling in .cpp or .c files? Why it must be in .cu file? Many things would by easier if I could call kernels directly from .cpp files. Now I always have to first implement a kernel calling function in .cu file which makes me to add a lot of unnecessary .cu files to my project. I am also using C++ templates a lot. It is great tool for writing kernels. It helps to optimize code (I can generate smaller specialized kernels and save small shared memory for example) and I can also call each kernel for example with both - single and double precision arithmetic. But then, to make these kernels accessible in my project, I need to write a lot of calling functions in .cu file. So I think that the necessity of having kernel and kernel callings in .cu has two main disadvantages:

  1. I have to add unnecessary .cu files containing just functions calling kernels.

  2. It limits use of C++ templates which are great tool combined with CUDA kernels.

Please tell me if misunderstood something or if you have the same feeling.


I have the same feeling about this !

The “solution” I use is to have all my C++ code in .cu files (not a single .cpp in my projects) even if there are no kernel calls inside. It’s not totally satisfying, but at least it avoids writing ugly things to wrap all kernel calls in separate .cu files.

What do you mean by “It limits use of C++ templates” ?

I use templates in my .cu files, and I never noticed any problem (but I’m not using all the possibilities of C++ templates)

I want to compile my project even without CUDA just by g++ and in that case it would not accept .cu files. The limitations of templates come from the fact, that each kernel generated by templates must be called by some function in .cu file otherwise it does not exist. I am writing something like a numerical library where most of my kernels are templates depending at least on one template parameter telling whether I compute in the single or double precision. I have kernel for parallel reduction which computes minimum, maximum or sum of elements in vector. So it makes 6 combinations - float/ double and min/max/sum for which I have six calling functions in some .cu file. And from the rest of my project I cannot refer this kernel as template :-(. I guess that it should not be a problem for nvcc to accept kernels even i .c and .cpp files.