Hello
I’m trying to “specialize” (overload would be a more adequate term) a template kernel function. My real code is quite long, but I made a simpler version to show you the problem.
I’ve got a generic virtual template class called generalClass
I’ve got a derived class called derivedClass which implement generalClass virtual functions.
Then I have a template kernel which should work for any class derived from generalClass
But I would like to overload it for derivedClass in order to implement an optimized version.
And lastly, I have a template class callingClass which calls the kernel.
Here is the code:
#include <cuda.h>
#include <cutil_inline.h>
template < typename A, unsigned int D, unsigned int Q >
class generalClass
{
private:
public:
generalClass() {}
};
template < typename A >
class derivedClass : public generalClass<A,2,9>
{
private:
public:
derivedClass() {}
};
//general test kernel
template< class A, template<typename U> class generalClass, typename V >
__global__ void testKernel(
generalClass<A> d,
generalClass<A> d_tmp)
{ }
///specialisation of testKernel
template< class A, template<class U> class generalClass, typename V >
__global__ void testKernel(
derivedClass<A> d,
derivedClass<A> d_tmp)
{ }
/// general template callingClass
template < typename T, template <typename A> class generalClass, typename V >
class callingClass
{
private:
public:
callingClass() { }
generalClass<float> d;
generalClass<float> d_tmp;
void testkernelWrapper()
{
dim3 block_size, grid_size;
block_size.x = 1; block_size.y = 0; block_size.z = 0;
grid_size.x = 1; grid_size.y = 0; grid_size.z = 0;
testKernel< float, generalClass, V> <<< grid_size, block_size>>>(d, d_tmp);
}
};
int main( void )
{
callingClass<float, derivedClass, int> *test = new callingClass<float, derivedClass, int> ();
test->testkernelWrapper();
return 0;
}
It doesn’t compile, the compiler (nvcc 4.0) doesn’t know which kernel to choose.
Error message:
nvcc test.cu -arch=sm_20 -I/usr/not-backed-up/NVIDIA_SDK_4_0/CUDALibraries/common/inc -L/usr/lib64/nvidia -lcuda
In file included from /tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:2:
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c: In function ‘void _device_stub__Z10testKernelIf12derivedClassiEvS0_IT_ES2(_Z12derivedClassIfE&, _Z12derivedClassIfE&)’:
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:6: error: insufficient contextual information to determine type
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:6: error: insufficient contextual information to determine type
In file included from /tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:2:
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c: At global scope:
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:7: error: ambiguous template specialization ‘__wrapper__device_stub_testKernel<float, template class derivedClass, int>’ for ‘void __wrapper__device_stub_testKernel(_Z12derivedClassIfE&, _Z12derivedClassIfE&)’
In file included from /tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:2:
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c: In function ‘void __sti____cudaRegisterAll_40_tmpxft_00001e6e_00000000_4_test2_cpp1_ii_main()’:
/tmp/tmpxft_00001e6e_00000000-1_test2.cudafe1.stub.c:8: error: insufficient contextual information to determine type
What did I do wrong?
Thank you for your help.
Note: if I get rid of the typename V parameter, the code compile, that means that in this case the compiler how to choose (but I’m not sure which kernel it chose).