Simple separate compilation with templates


I was testing the cuda sample simpleSeparateCompilation in cuda toolkit 5.0 with templates.
I added a kernel kernel_test() to just printing out the threadIdx and it works.

If I make this kernel template

template <typename T> __global__ void kernel_test(T number)
   printf("\n tid=%d", threadIdx.x+number );


I get the following from compiler:

g++ ...  error: undefined reference to 'kernel_test(int)

if I declare the kernel as external I get the following:

simpleDeviceLibrary.cuh(19): error: invalid storage class for a template declaration

is there any restriction in using template in separate compilation mode? they work perfectly in “whole program compilation”.


This is a general C++ issue, not specific to CUDA - compiler needs to “see” the whole template when instantiating. “extern” templates are a C++11 feature and are not supported by all modern compilers.