How to run templatized global function cuda templates

Hi,

It has been hinted quite strongly about templates in cuda, and the following

template<typename T>

__global__ void

foo( T* bar )

{

    *bar = 42;

}

is eaten by nvcc. However, as soon as a function is a template, it disappears completely from the ptx-assembly. So I guess it must be instatiated for some given template parameters somehow. But how? And what to give cudaLaunch?

Chris

Thats right, by definition (in C++ or CUDA), template classes don’t generate any code unless you instantiate them!

To invoke your template kernel, just pass the template parameters before the execution configuration, like this:

foo<float><<<threads, blocks, smem>>>(devicevar);

However, if the type of devicevar is obvious to the compiler, you should be able to drop the explicit template parameter in this case, since the compiler can infer T from the type of devicevar. Try it both ways. The one above will certainly work.

Note that dynamic (extern) shared memory is tricky with kernels that are templatized on type. I have a solution for this that I will include in a future SDK sample…

Mark

Aha, thanks Mark!

I usually use nvcc only on the global and device code and run cudaLaunch from c++. So I guess one way is to add a dummy host function which is never called that performs the invocations needed in the .cu-file to get the instantiations generated, and then run cudaLaunch( “foo”) or something?

Chris

Oh, are you talking about the driver API? I’ve never used that with templates, only the runtime API. Please experiment and let us know what you find.

Mark

Seems to work!

The global func is defined in the cu-file as follows:

template<typename T, unsigned int block_dim_log2, unsigned int tile_size_log2, unsigned int cols_log2>

__global__ void

foo( unsigned int *out, float t )

{

    // ...

}

and below, still in the cu-file, I have

__host__ void

dummy(unsigned int* f, float t)

{

    foo<float,3,7,4><<<8,8,0>>>(f, t);

    foo<float,3,8,4><<<8,8,0>>>(f, t);

}

to create two suitable instatiations. Inspecting the ptx-file, the following two functions have been instatiated:

   // ...

    .entry _Z3fooIfLj3ELj7ELj4EEvPjf

    // ...

    .entry _Z3fooIfLj3ELj8ELj4EEvPjf

    // ...

which looks like C+±mangling. Then, in the C+±file:

   // ...

    CUDA_SAFFE_CALL( cudaLaunch( "_Z3fooIfLj3ELj7ELj4EEvPjf" ) );

    //...

works perfectly. :-)

Chris

Using cudaLaunch is only necessary if you are using the driver API (cuda.dll, cu* functions). The syntax I provided should work for the runtime API (cudart.dll, cuda* functions).

edit: just noticed you are calling from a C++ file (i.e. not compiled by nvcc), which means you do need to call cudaLaunch. Calling the function directly only works within a single compilation unit compiled by nvcc, currently.

Mark

that’s exactly what I needed.

I always got a linker error (unresolved external …) since the kernel and related templatized functions in the .cu file were not instantiated.

I also added a dummy function to “call” them and it works!

void DummyTemplateInstantiator(void) 

{

  Sampling_downsampling2<float>(NULL, NULL, 0,0);

  Sampling_downsampling2<signed char>(NULL, NULL, 0,0);

}

thanks for your help! :rolleyes:

fabian