Hi everybody! I’m developing a C++/CUDA library and I have to deal with two separate compilations every time causing linking errors - unresolved external symbol. The problem is, in some cases, that nvcc doesn’t instantiate all the possible specializations of the involved template function and at linking time there’s no way to resolve the reference. I thought about it and I found two ways to (possibly) resolve this problem:
Specializing every template with every possible data type (a waist of both time & space)
Using some #pragma directive to tell nvcc to specialize a template with some particular data types.
So, now the question: is there anything similar to #pragma instantiate in CUDA C? Can I tell the compiler to compile with the right data types? Is there any other solution to my problem? Thank you all.
are you able to use the function template void call_kernel(T * dst, T * src) in a c++ code ?
How does work the interface of a cuda C templated function with c++ by using extern “C”?
Hey, LSChien & seb: thanks a lot! You solved my problem in two different ways. I’m writing a big library to use CUDA potential in C++ code with different containers (vector,matrix,cube) and iterators (row, column, single-element), and I was looking for an elegant solution to the silly replication of code to let the user do whatever he wants. Again: thanks a lot!
In that case, the cpp file does not have access to the function code and so it can’t link.
Even when adding explicit instantation of the function in the .cu, the linker is not happy.
I am writing something like:
// .cu compiled with nvcc
template<class T> extern void foo( T* t ) { ...; } // the templated function
template extern void foo<int>( int* t ); // explicit instantation for int
// .cpp compiled with visual studio
template<class T> extern void foo( T* t );
template extern void foo<int>( int* t ); // even when adding this
int main(int argc, char** argv)
{
foo<int>(0);
return 0;
}
Note that ifdef statement is not necessary, but prevents from accidental including of foo.h in CUDA code. I think that extern is not necessary, however.