forcing template compilation in CUDA C

Hi everybody! I’m developing a C++/CUDA library and I have to deal with two separate compilations every time causing linking errors - unresolved external symbol. The problem is, in some cases, that nvcc doesn’t instantiate all the possible specializations of the involved template function and at linking time there’s no way to resolve the reference. I thought about it and I found two ways to (possibly) resolve this problem:

  • Specializing every template with every possible data type (a waist of both time & space)
  • Using some #pragma directive to tell nvcc to specialize a template with some particular data types.
    So, now the question: is there anything similar to #pragma instantiate in CUDA C? Can I tell the compiler to compile with the right data types? Is there any other solution to my problem? Thank you all.

I think that you just need to do explicit instantiation.
http://www.cplusplus.com/forum/articles/14272/

if definition of a template is not in the file scope, then compiler can only do type checking.
This is not nvcc’s problem.

I use the Boost Preprocessor library to simplify this problem. It works well for me.

I have code like this:

template <typename T>

__global__ void  kernel(T * dst, T * src)

{

  // do whatever

}

template <typename T>

void call_kernel(T * dst, T * src)

{

  kernel<T><<< blocks, threads >>>(dst, src);

}

I generate the stubs I need using this code (in the same compilation unit):

#include <boost/preprocessor/seq/for_each.hpp>

// list of types to generate code for

#define TYPES (char)(short int)                                                \

  (unsigned int)(int)(long int)                                                \

  (float)(double)(std::complex<float>)                                         \

  /**/

#define GENERATE_SPECIALIZATIONS(r, data, elem)                                \

  template                                                                     \

  void kernel <elem >(elem * dst, elem * src);                                 \

  /**/

BOOST_PP_SEQ_FOR_EACH(GENERATE_SPECIALIZATIONS, _, TYPES)

this way I only have to extend the sequence of types (define TYPES) if I need to generate a new specialization.

Not ideal but a lot nicer than replicating the code. Documentation for the SEQ_FOR_EACH macro is here.

Hi,

are you able to use the function template void call_kernel(T * dst, T * src) in a c++ code ?
How does work the interface of a cuda C templated function with c++ by using extern “C”?

– pium

Yes, it is possible only in C++ code. Instead of

extern "C"

you should use:

template <typename T> 

extern void call_kernel(T * dst, T * src);

Hey, LSChien & seb: thanks a lot! You solved my problem in two different ways. I’m writing a big library to use CUDA potential in C++ code with different containers (vector,matrix,cube) and iterators (row, column, single-element), and I was looking for an elegant solution to the silly replication of code to let the user do whatever he wants. Again: thanks a lot!

In that case, the cpp file does not have access to the function code and so it can’t link.

Even when adding explicit instantation of the function in the .cu, the linker is not happy.

I am writing something like:

// .cu compiled with nvcc

template<class T> extern void foo( T* t ) { ...; } // the templated function

template extern void foo<int>( int* t ); // explicit instantation for int
// .cpp compiled with visual studio

template<class T> extern void foo( T* t );

template extern void foo<int>( int* t ); // even when adding this

int main(int argc, char** argv)

{

   foo<int>(0);

   return 0;

}

Try removing the extern here.

Thanks, I did not know this mechanism in c++, quite useful with cuda.

Note that the following line is not necessary.

template extern void foo<int>( int* t ); // even when adding this

I had the same problem: calling CUDA functions from C++ code. I solved in this way:

// foo.h

#ifdef __cplusplus

template <typename T>

void foo(T* t);

#endif
// foo.cu

template <typename T>

void foo(T* t)

{

    //...

}

template foo(int* t);
// .cpp

#include "foo.h"

int main(int argc, char** argv)

{

   foo<int>(0);

   return 0;

}

Note that ifdef statement is not necessary, but prevents from accidental including of foo.h in CUDA code. I think that extern is not necessary, however.