forcing template compilation in CUDA C

bp_1986 · April 2, 2011, 2:17am

Hi everybody! I’m developing a C++/CUDA library and I have to deal with two separate compilations every time causing linking errors - unresolved external symbol. The problem is, in some cases, that nvcc doesn’t instantiate all the possible specializations of the involved template function and at linking time there’s no way to resolve the reference. I thought about it and I found two ways to (possibly) resolve this problem:

Specializing every template with every possible data type (a waist of both time & space)
Using some #pragma directive to tell nvcc to specialize a template with some particular data types.
So, now the question: is there anything similar to #pragma instantiate in CUDA C? Can I tell the compiler to compile with the right data types? Is there any other solution to my problem? Thank you all.

LSChien · April 2, 2011, 9:09am

I think that you just need to do explicit instantiation.
http://www.cplusplus.com/forum/articles/14272/

if definition of a template is not in the file scope, then compiler can only do type checking.
This is not nvcc’s problem.

seb · April 4, 2011, 12:45pm

I use the Boost Preprocessor library to simplify this problem. It works well for me.

I have code like this:

template <typename T>

__global__ void  kernel(T * dst, T * src)

{

  // do whatever

}

template <typename T>

void call_kernel(T * dst, T * src)

{

  kernel<T><<< blocks, threads >>>(dst, src);

}

I generate the stubs I need using this code (in the same compilation unit):

#include <boost/preprocessor/seq/for_each.hpp>

// list of types to generate code for

#define TYPES (char)(short int)                                                \

  (unsigned int)(int)(long int)                                                \

  (float)(double)(std::complex<float>)                                         \

  /**/

#define GENERATE_SPECIALIZATIONS(r, data, elem)                                \

  template                                                                     \

  void kernel <elem >(elem * dst, elem * src);                                 \

  /**/

BOOST_PP_SEQ_FOR_EACH(GENERATE_SPECIALIZATIONS, _, TYPES)

this way I only have to extend the sequence of types (define TYPES) if I need to generate a new specialization.

Not ideal but a lot nicer than replicating the code. Documentation for the SEQ_FOR_EACH macro is here.

pium · April 4, 2011, 3:27pm

Hi,

are you able to use the function template void call_kernel(T * dst, T * src) in a c++ code ?
How does work the interface of a cuda C templated function with c++ by using extern “C”?

– pium

seb · April 5, 2011, 7:43am

Yes, it is possible only in C++ code. Instead of

extern "C"

you should use:

template <typename T> 

extern void call_kernel(T * dst, T * src);

bp_1986 · April 5, 2011, 8:30am

Hey, LSChien & seb: thanks a lot! You solved my problem in two different ways. I’m writing a big library to use CUDA potential in C++ code with different containers (vector,matrix,cube) and iterators (row, column, single-element), and I was looking for an elegant solution to the silly replication of code to let the user do whatever he wants. Again: thanks a lot!

pium · April 5, 2011, 9:56am

In that case, the cpp file does not have access to the function code and so it can’t link.

Even when adding explicit instantation of the function in the .cu, the linker is not happy.

I am writing something like:

// .cu compiled with nvcc

template<class T> extern void foo( T* t ) { ...; } // the templated function

template extern void foo<int>( int* t ); // explicit instantation for int

// .cpp compiled with visual studio

template<class T> extern void foo( T* t );

template extern void foo<int>( int* t ); // even when adding this

int main(int argc, char** argv)

{

   foo<int>(0);

   return 0;

}

seb · April 5, 2011, 10:42am

Try removing the extern here.

pium · April 5, 2011, 1:36pm

Thanks, I did not know this mechanism in c++, quite useful with cuda.

Note that the following line is not necessary.

template extern void foo<int>( int* t ); // even when adding this

bp_1986 · April 5, 2011, 1:44pm

In that case, the cpp file does not have access to the function code and so it can’t link.

Even when adding explicit instantation of the function in the .cu, the linker is not happy.

I am writing something like:
// .cu compiled with nvcc

template<class T> extern void foo( T* t ) { ...; } // the templated function

template extern void foo<int>( int* t ); // explicit instantation for int
// .cpp compiled with visual studio

template<class T> extern void foo( T* t );

template extern void foo<int>( int* t ); // even when adding this

int main(int argc, char** argv)

{

   foo<int>(0);

   return 0;

}

I had the same problem: calling CUDA functions from C++ code. I solved in this way:

// foo.h

#ifdef __cplusplus

template <typename T>

void foo(T* t);

#endif

// foo.cu

template <typename T>

void foo(T* t)

{

    //...

}

template foo(int* t);

// .cpp

#include "foo.h"

int main(int argc, char** argv)

{

   foo<int>(0);

   return 0;

}

Note that ifdef statement is not necessary, but prevents from accidental including of foo.h in CUDA code. I think that extern is not necessary, however.

Topic		Replies	Views
How to run templatized global function cuda templates CUDA Programming and Performance	6	29126	November 30, 2009
Access to CUDA library functions inside specialized instantiations of __device__ function templates CUDA Programming and Performance	3	1390	April 9, 2013
templated functions problem with CUDA CUDA Programming and Performance	5	9351	May 19, 2009
Does this count as instantiating the template function? CUDA Programming and Performance	0	375	January 6, 2021
Function template specialization CUDA Programming and Performance	8	4217	April 12, 2012
Pragma/Attribute to Avoid Unnecessary Template Instantiations and/or Function Analysis in the Compiler Front-End CUDA NVCC Compiler	2	400	May 8, 2024
Simple separate compilation with templates CUDA Programming and Performance	1	1531	February 28, 2013
Program compiles in gcc5 but not with nvcc CUDA Programming and Performance	2	574	March 21, 2018
Unresolved externals in CUDA expression template library under Visual Studio 2010 CUDA Programming and Performance	3	1914	April 4, 2013
extern "C" template CUDA Programming and Performance	1	2689	July 13, 2009

forcing template compilation in CUDA C

Related topics