How to run templatized global function cuda templates

cdyken · March 30, 2007, 2:20pm

Hi,

It has been hinted quite strongly about templates in cuda, and the following

template<typename T>

__global__ void

foo( T* bar )

{

    *bar = 42;

}

is eaten by nvcc. However, as soon as a function is a template, it disappears completely from the ptx-assembly. So I guess it must be instatiated for some given template parameters somehow. But how? And what to give cudaLaunch?

Chris

Mark_Harris · March 30, 2007, 2:39pm

Thats right, by definition (in C++ or CUDA), template classes don’t generate any code unless you instantiate them!

To invoke your template kernel, just pass the template parameters before the execution configuration, like this:

foo<float><<<threads, blocks, smem>>>(devicevar);

However, if the type of devicevar is obvious to the compiler, you should be able to drop the explicit template parameter in this case, since the compiler can infer T from the type of devicevar. Try it both ways. The one above will certainly work.

Note that dynamic (extern) shared memory is tricky with kernels that are templatized on type. I have a solution for this that I will include in a future SDK sample…

Mark

cdyken · March 30, 2007, 2:49pm

Aha, thanks Mark!

I usually use nvcc only on the global and device code and run cudaLaunch from c++. So I guess one way is to add a dummy host function which is never called that performs the invocations needed in the .cu-file to get the instantiations generated, and then run cudaLaunch( “foo”) or something?

Chris

Mark_Harris · March 30, 2007, 2:53pm

Oh, are you talking about the driver API? I’ve never used that with templates, only the runtime API. Please experiment and let us know what you find.

Mark

cdyken · April 2, 2007, 9:25am

Seems to work!

The global func is defined in the cu-file as follows:

template<typename T, unsigned int block_dim_log2, unsigned int tile_size_log2, unsigned int cols_log2>

__global__ void

foo( unsigned int *out, float t )

{

    // ...

}

and below, still in the cu-file, I have

__host__ void

dummy(unsigned int* f, float t)

{

    foo<float,3,7,4><<<8,8,0>>>(f, t);

    foo<float,3,8,4><<<8,8,0>>>(f, t);

}

to create two suitable instatiations. Inspecting the ptx-file, the following two functions have been instatiated:

   // ...

    .entry _Z3fooIfLj3ELj7ELj4EEvPjf

    // ...

    .entry _Z3fooIfLj3ELj8ELj4EEvPjf

    // ...

which looks like C+±mangling. Then, in the C+±file:

   // ...

    CUDA_SAFFE_CALL( cudaLaunch( "_Z3fooIfLj3ELj7ELj4EEvPjf" ) );

    //...

works perfectly. :-)

Chris

Mark_Harris · April 2, 2007, 7:06pm

Using cudaLaunch is only necessary if you are using the driver API (cuda.dll, cu* functions). The syntax I provided should work for the runtime API (cudart.dll, cuda* functions).

edit: just noticed you are calling from a C++ file (i.e. not compiled by nvcc), which means you do need to call cudaLaunch. Calling the function directly only works within a single compilation unit compiled by nvcc, currently.

Mark

fabbraun · November 30, 2009, 10:39am

that’s exactly what I needed.

I always got a linker error (unresolved external …) since the kernel and related templatized functions in the .cu file were not instantiated.

I also added a dummy function to “call” them and it works!

void DummyTemplateInstantiator(void) 

{

  Sampling_downsampling2<float>(NULL, NULL, 0,0);

  Sampling_downsampling2<signed char>(NULL, NULL, 0,0);

}

thanks for your help! :rolleyes:

fabian